[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-11 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472932#comment-16472932
 ] 

Bibin A Chundatt commented on YARN-7892:


[~Naganarasimha]

TestPipeApplication timeout is happening in trunk too.

Over all the patch looks good.. Will commit  later today if no objections.



> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, 
> YARN-7892-YARN-3409.007.patch, YARN-7892-YARN-3409.008.patch, 
> YARN-7892-YARN-3409.009.patch, YARN-7892-YARN-3409.010.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472927#comment-16472927
 ] 

Hudson commented on YARN-8265:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14181 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14181/])
YARN-8265.  Improve DNS handling on docker IP changes. (eyang: rev 
0ff94563b9b62d0426d475dc0f84152b68f1ff0d)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/MockServiceAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java


> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472923#comment-16472923
 ] 

Eric Yang commented on YARN-8265:
-

+1 looks good to me.  I just committed this on trunk and branch-3.1.
Thank you [~billie.rinaldi] for the review and patch.

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8265:

Target Version/s: 3.2.0, 3.1.1  (was: 3.2.0)
   Fix Version/s: 3.1.1
  3.2.0

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472901#comment-16472901
 ] 

Eric Yang commented on YARN-8265:
-

"onContainerRestart" event is currently not working.  Therefore the workaround 
solution is the only feasible solution.  Therefore, I am inclined to commit the 
patch 003 for 3.1.1 release.

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472895#comment-16472895
 ] 

genericqa commented on YARN-4599:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
58s{color} | {color:green} root: The patch generated 0 new + 230 unchanged - 1 
fixed = 230 total (was 231) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}173m 16s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
52s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}376m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | 
hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor |
\\
\\
|| Subsystem || Report/Notes ||
| 

[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472863#comment-16472863
 ] 

Íñigo Goiri commented on YARN-8275:
---

[~miklos.szeg...@cloudera.com], we would have to decide how to write the native 
service and that opens a big design space.
Do you have any proposal for that?
In any case, I would make this pluggable and then we can rely on winutils, a 
separate service or JNI.

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472839#comment-16472839
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] Thank you for the great reviews and commit.
[~shaneku...@gmail.com] [~Jim_Brennan] [~ebadger] Thank you for the reviews.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472837#comment-16472837
 ] 

Eric Yang commented on YARN-8265:
-

[~billie.rinaldi] I am struggling to understand the reason that node manager 
would decide to restart the docker container without consulting with 
application master.  AM makes the decision of the state of the containers, and 
node manager only follow orders from AM.  This helps to prevent race conditions 
between AM and NM to decide which container should stay up and running.  AM 
will follow state transitions to ensure it is following a pre-defined path.  
With relaunch container implemented in YARN-7973, AM still make decision when 
to restart container.  "onContainerRestart" event will be received by AM.  If 
we run ContainerStartedTransition again, it will check for IP changes and 
cancel the scheduled timer thread.  I think this will leads to more desired 
outcome without leaving the timer thread open ended.

Alternate approach is to move ContainerStatusRetriever to 
ContainerBecomeReadyTransition, and use BECOME_READY transition to check for IP 
address.

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472835#comment-16472835
 ] 

genericqa commented on YARN-8265:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
15s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8265 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923110/YARN-8265.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 09175230dafa 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4b4f24a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20710/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20710/testReport/ |
| Max. process+thread count | 777 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applicati

[jira] [Commented] (YARN-8271) Change UI2 labeling of certain tables to avoid confusion

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472833#comment-16472833
 ] 

genericqa commented on YARN-8271:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8271 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922720/YARN-8271.0001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux c794f8a44b0b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4b4f24a |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20711/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Change UI2 labeling of certain tables to avoid confusion
> 
>
> Key: YARN-8271
> URL: https://issues.apache.org/jira/browse/YARN-8271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8271.0001.patch
>
>
> Update labeling for few items to avoid confusion
>  - Cluster Page (/cluster-overview):
>  -- "Finished apps" --> "Finished apps from all users"
>  -- "Running apps" --> "Running apps from all users"
>  - Queues overview page (/yarn-queues/root) && Per queue page 
> (/yarn-queue/root/apps)
>  -- "Running Apps" --> "Running apps from all users in queue "
>  - Nodes Page - side bar for all pages 
>  -- "List of Applications" --> "List of Applications on this node"
>  -- "List of Containers" --> "List of Containers on this node"
>  - Yarn Tools
>  ** Yarn Tools --> YARN Tools
>  - Queue page
>  ** Running Apps: --> Running Apps From All Users



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472830#comment-16472830
 ] 

genericqa commented on YARN-8266:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8266 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12922566/YARN-8266.001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux 2c9915d5c292 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4b4f24a |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20713/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Clicking on application from cluster view should redirect to application 
> attempt page
> -
>
> Key: YARN-8266
> URL: https://issues.apache.org/jira/browse/YARN-8266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8266.001.patch
>
>
> Steps:
> 1) Start one application
>  2) Go to cluster overview page
>  3) Click on applicationId from Cluster Resource Usage By Application
> This action redirects to 
> [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is 
> invalid url. It does not show any details.
> Instead It should redirect to attempt page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472822#comment-16472822
 ] 

Hudson commented on YARN-7654:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14180 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14180/])
YARN-7654. Support ENTRY_POINT for docker container. Contributed by Eric 
(jlowe: rev 6c8e51ca7eaaeef0626658b3c45d446a537e4dc0)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractProviderService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/docker/DockerProviderService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472814#comment-16472814
 ] 

Jason Lowe commented on YARN-7654:
--

Thanks for updating the patch!  The unit test failure does not appear to be 
related, and the test passes for me with the patch applied.  Looks like it is a 
known flaky test according to YARN-7145.

+1 for patch 024.  Committing this.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472808#comment-16472808
 ] 

Billie Rinaldi commented on YARN-8265:
--

Patch 3 fixes the remaining issues I have identified. One more question: a 
status retriever runs once per second for each container. Is this still 
appropriate now that they will run forever for docker containers?

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-11 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: Elastic Memory Control in YARN.pdf

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472806#comment-16472806
 ] 

genericqa commented on YARN-7654:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 14 new + 89 unchanged - 0 fixed = 103 total (was 89) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 30s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
55s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {

[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8265:
-
Attachment: YARN-8265.003.patch

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch, 
> YARN-8265.003.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-11 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472786#comment-16472786
 ] 

Rohith Sharma K S commented on YARN-8130:
-

Test failures are unrelated to this patch. [~haibochen]/[~vrushalic] could you 
please help to commit this.

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472780#comment-16472780
 ] 

Eric Yang commented on YARN-8274:
-

[~jlowe] Thank you for all your efforts.  It is greatly appreciated.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472779#comment-16472779
 ] 

genericqa commented on YARN-8080:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 92 new + 121 unchanged - 2 fixed = 213 total (was 123) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
47s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8080 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attac

[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472777#comment-16472777
 ] 

Jason Lowe commented on YARN-8274:
--

bq.  It would be nice if the code was refactored to add docker_binary in 
construct_docker_command to avoid duplicated add_to_args for docker_binary for 
all get_docker_*_command, but the priority is to get a good stable state for 
release.

I was thinking the exact same thing as I was writing the patch.  I went for the 
simple approach to keep the patch small and easy to review since it's a bugfix. 
 I filed YARN-8284 to track that.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472778#comment-16472778
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] All 5 scenarios passed with my local kerberos enabled cluster tests.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8284) get_docker_command refactoring

2018-05-11 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8284:


 Summary: get_docker_command refactoring
 Key: YARN-8284
 URL: https://issues.apache.org/jira/browse/YARN-8284
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.2.0, 3.1.1
Reporter: Jason Lowe


YARN-8274 occurred because get_docker_command's helper functions each have to 
remember to put the docker binary as the first argument.  This is error prone 
and causes code duplication for each of the helper functions.  It would be 
safer and simpler if get_docker_command initialized the docker binary argument 
in one place and each of the helper functions only added the arguments specific 
to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472772#comment-16472772
 ] 

Chandni Singh commented on YARN-8141:
-

We had a discussion offline that 
* [~leftnoteasy]'s use-case is unblocked by providing configuration files in 
the spec
* Will convert this jira to remove 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}}. 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} can support user specified mounts of 
localized resources as well. 
* Mark this jira for next release.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8141:

Target Version/s: 3.2.0  (was: 3.1.1)

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472771#comment-16472771
 ] 

Eric Yang commented on YARN-8274:
-

[~ebadger] Your earnestly advocate is not going unheard.  I am sorry that I 
introduced bugs during the rebase.  There is no excuse for making mistakes when 
patch is snowballing.  It won't happen again.

[~jlowe] Nits: It would be nice if the code was refactored to add docker_binary 
in construct_docker_command to avoid duplicated add_to_args for docker_binary 
for all get_docker_*_command, but the priority is to get a good stable state 
for release.  Hence, I am sorry that I committed this prematurely without 
listening to my inner voice.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services

2018-05-11 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8283:


 Summary: [Umbrella] MaWo - A Master Worker framework on top of 
YARN Services
 Key: YARN-8283
 URL: https://issues.apache.org/jira/browse/YARN-8283
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Yesha Vora


There is a need for an application / framework to handle Master-Worker 
scenarios. There are existing frameworks on YARN which can be used to run a job 
in distributed manner such as Mapreduce, Tez, Spark etc. But master-worker 
use-cases usually are force-fed into one of these existing frameworks which 
have been designed primarily around data-parallelism instead of generic Master 
Worker type of computations.

In this JIRA, we’d like to contribute MaWo - a YARN Service based framework 
that achieves this goal. The overall goal is to create an app that can take an 
input job specification with tasks, their durations and have a Master dish the 
tasks off to a predetermined set of workers. The components will be responsible 
for making sure that the tasks and the overall job finish in specific time 
durations.

We have been using a version of the MaWo framework for running unit tests of 
Hadoop in a parallel manner on an existing Hadoop YARN cluster. What typically 
takes 10 hours to run all of Hadoop project’s unit-tests can finish under 20 
minutes on a MaWo app of about 50 containers!

YARN-3307 was an original attempt at this but through a first-class YARN app. 
In this JIRA, we instead use YARN Service for orchestration so that our code 
can focus on the core Master Worker paradigm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services

2018-05-11 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora reassigned YARN-8283:


Assignee: Yesha Vora

> [Umbrella] MaWo - A Master Worker framework on top of YARN Services
> ---
>
> Key: YARN-8283
> URL: https://issues.apache.org/jira/browse/YARN-8283
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
>
> There is a need for an application / framework to handle Master-Worker 
> scenarios. There are existing frameworks on YARN which can be used to run a 
> job in distributed manner such as Mapreduce, Tez, Spark etc. But 
> master-worker use-cases usually are force-fed into one of these existing 
> frameworks which have been designed primarily around data-parallelism instead 
> of generic Master Worker type of computations.
> In this JIRA, we’d like to contribute MaWo - a YARN Service based framework 
> that achieves this goal. The overall goal is to create an app that can take 
> an input job specification with tasks, their durations and have a Master dish 
> the tasks off to a predetermined set of workers. The components will be 
> responsible for making sure that the tasks and the overall job finish in 
> specific time durations.
> We have been using a version of the MaWo framework for running unit tests of 
> Hadoop in a parallel manner on an existing Hadoop YARN cluster. What 
> typically takes 10 hours to run all of Hadoop project’s unit-tests can finish 
> under 20 minutes on a MaWo app of about 50 containers!
> YARN-3307 was an original attempt at this but through a first-class YARN app. 
> In this JIRA, we instead use YARN Service for orchestration so that our code 
> can focus on the core Master Worker paradigm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472761#comment-16472761
 ] 

genericqa commented on YARN-8265:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
26s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8265 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923092/YARN-8265.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ade0de89ae2d 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4b4f24a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20708/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20708/testReport/ |
| Max. process+thread count | 777 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applicat

[jira] [Assigned] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy

2018-05-11 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-7340:
--

Assignee: Dinesh Chitlangia

> Missing the time stamp in exception message in Class NoOverCommitPolicy
> ---
>
> Key: YARN-7340
> URL: https://issues.apache.org/jira/browse/YARN-7340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Dinesh Chitlangia
>Priority: Minor
>  Labels: newbie++
>
> It could be easily figured out by reading code.
> {code}
>   throw new ResourceOverCommitException(
>   "Resources at time " + " would be overcommitted by "
>   + "accepting reservation: " + reservation.getReservationId());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8265) Service AM should retrieve new IP for docker container relaunched by NM

2018-05-11 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8265:
-
Summary: Service AM should retrieve new IP for docker container relaunched 
by NM  (was: AM should retrieve new IP for restarted container)

> Service AM should retrieve new IP for docker container relaunched by NM
> ---
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy

2018-05-11 Thread Dinesh Chitlangia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472745#comment-16472745
 ] 

Dinesh Chitlangia commented on YARN-7340:
-

[~yufeigu] I would like to work on this.

Could you please assign this to me? Thank you.

> Missing the time stamp in exception message in Class NoOverCommitPolicy
> ---
>
> Key: YARN-7340
> URL: https://issues.apache.org/jira/browse/YARN-7340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie++
>
> It could be easily figured out by reading code.
> {code}
>   throw new ResourceOverCommitException(
>   "Resources at time " + " would be overcommitted by "
>   + "accepting reservation: " + reservation.getReservationId());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first

2018-05-11 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472742#comment-16472742
 ] 

Gour Saha commented on YARN-8243:
-

Thanks [~billie.rinaldi] and [~suma.shivaprasad] for reviewing. Also thanks to 
[~billie.rinaldi] for committing the patch.

> Flex down should remove instance with largest component instance ID first
> -
>
> Key: YARN-8243
> URL: https://issues.apache.org/jira/browse/YARN-8243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8243.01.patch, YARN-8243.02.patch
>
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
> {
>   "name": "ping",
>   "number_of_containers": 2,
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "launch_command": "sleep 9000",
>   "placement_policy": {
> "constraints": [
>   {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
>   "ping"
> ]
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state

[jira] [Commented] (YARN-8265) AM should retrieve new IP for restarted container

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472697#comment-16472697
 ] 

Billie Rinaldi commented on YARN-8265:
--

Patch 2 is allowing the AM to retrieve the new IP, but a couple more things 
need to be fixed. The logging when status is obtained is now too verbose and, 
more importantly, RegistryDNS still knows about the old IP, so it has 2 IPs for 
the container.

> AM should retrieve new IP for restarted container
> -
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472696#comment-16472696
 ] 

Hudson commented on YARN-3610:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14178 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14178/])
YARN-3610. FairScheduler: Add steady-fair-shares to the REST API (haibochen: 
rev 50408cfc6987b554f8f8f3d6711f7fa61c6e6d6f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md


> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-3610.001.patch, YARN-3610.002.patch, 
> YARN-3610.003.patch
>
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8265) AM should retrieve new IP for restarted container

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472693#comment-16472693
 ] 

Billie Rinaldi commented on YARN-8265:
--

How to test this bug:
# Run a simple app with a sleep and exit launch command.
{noformat}
{
  "name": "test-ip-change",
  "version": "1",
  "lifetime": "3600",
  "configuration": {
"properties": {
  "docker.network": "bridge"
}
  },
  "components" :
[
  {
"name": "centos7",
"number_of_containers": 1,
"artifact": {
  "id": "library/centos:7",
  "type": "DOCKER"
},
"launch_command": "sleep 60; exit 1",
"resource": {
  "cpus": 2,
  "memory": "1024"
}
  }
]
}
{noformat}
# Verify that the docker container has started running, and then run the 
following script (assuming you only have one docker container running in your 
test environment). This will grab the IP of the service's container as soon as 
that container goes down. Leave this docker container running.
{noformat}
while docker ps | grep container_ > /dev/null; do :; done; docker run -it 
library/centos:7 bash
{noformat}
# After the service's container is restarted, verify that it has a new IP. 
Before the patch is applied, verify the AM and RegistryDNS have not received 
the new IP of the container. After the patch is applied, verify that they do 
receive the new IP.

> AM should retrieve new IP for restarted container
> -
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472688#comment-16472688
 ] 

Billie Rinaldi commented on YARN-8141:
--

[~leftnoteasy], the user should make them STATIC configuration files in that 
case, and the AM will automatically set up 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2018-05-11 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-4599:
-
Attachment: YARN-4599.006.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4599.000.patch, YARN-4599.001.patch, 
> YARN-4599.002.patch, YARN-4599.003.patch, YARN-4599.004.patch, 
> YARN-4599.005.patch, YARN-4599.006.patch, YARN-4599.sandflee.patch, 
> yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472679#comment-16472679
 ] 

Eric Badger commented on YARN-8274:
---

bq. With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like 
to get YARN-7654 committed before end of today. YARN-7654 and YARN-8207 are 
probably left uncommitted for too long
I understand that you want to get these patches into 3.1.1, but I don't believe 
we should rush to get features into releases and in the process compromise on 
quality. Rushed patches/reviews lead to bugs like this happening at an elevated 
rate. I'm also not particularly compelled by the argument that YARN-7654 and 
YARN-8207 have been uncommitted for too long. YARN-8207 ended up being a 127 kB 
patch of entirely C code, which is incredibly time-consuming to review, while 
YARN-7654 is now on patch number 23. It's not like these aren't getting 
reviewed, they are just going through a normal process of comprehensive review. 
I think that YARN-8027 getting committed in 2 weeks is a semi-miracle given the 
size, complexity, and possible ramifications of the changes. Reviewing that 
much C code (especially in a setuid binary) throughout 10 different patches is 
basically a full-time job. [~jlowe] has spent countless more hours/days than I 
think should be reasonably expected and is still working in an attempt to get 
these patches into 3.1.1. If anything, he should be commended and thanked for 
his yeoman’s effort here regardless of whether YARN-7654 makes it into 3.1.1.

So, while I understand that deadlines exist and that we should strive to meet 
them, I don't believe that we should rush patches in solely because of a 
deadline. That destabilizes the project and causes more work for everyone. If a 
patch/feature isn't fully ready, we should step back and get it into the next 
release rather than cut time on reviews and possibly miss something. At the end 
of the day, if we are introducing bugs like this consistently, which recently 
we have been, then we are clearly iterating too quickly and need to spend more 
time on reviewing each patch instead of rushing them to being committed. 

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor

[jira] [Created] (YARN-8282) [C] Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-8282:
--

 Summary: [C] Create a JNI interface to interact with Windows
 Key: YARN-8282
 URL: https://issues.apache.org/jira/browse/YARN-8282
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8265) AM should retrieve new IP for restarted container

2018-05-11 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi reassigned YARN-8265:


Assignee: Billie Rinaldi  (was: Eric Yang)

> AM should retrieve new IP for restarted container
> -
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472663#comment-16472663
 ] 

Miklos Szegedi commented on YARN-8275:
--

[~giovanni.fumarola], I am curious about your opinion about the design of 
YARN-4599. In that case we considered JNI vs. a long running native process 
communicating with YARN over pipe. The latter seems better in terms of security 
and maintainability in case some native functions start corrupting JVM heap. 
There is only a single process start in that case, so that it does not affect 
performance. What do you think?

 

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472661#comment-16472661
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] Patch 24 fixed the issues above.  I still need time to test all 5 
scenarios to make sure that command doesn't get pre-processed by mistake.  The 
5 scenarios are:

# Mapreduce
# LLAP app
# Docker app with command override
# Docker app with entry point
# Docker app with entry point and no launch command

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472657#comment-16472657
 ] 

Haibo Chen commented on YARN-3610:
--

+1. Checking this in shortly.

> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
>Priority: Major
> Attachments: YARN-3610.001.patch, YARN-3610.002.patch, 
> YARN-3610.003.patch
>
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-11 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472653#comment-16472653
 ] 

Suma Shivaprasad commented on YARN-8080:


Fixed UT failure/some checkstyle issues

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-11 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.012.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472652#comment-16472652
 ] 

Wangda Tan commented on YARN-8141:
--

[~shaneku...@gmail.com], an example is user stores hadoop configs to HDFS, 
which will be automatically saved to container's local folder. And 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS can be used to be mounted 
when container being launched. This is useful for rolling upgrade case which 
application should not read /etc/hadoop/conf directly.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7654:

Attachment: YARN-7654.024.patch

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch, 
> YARN-7654.024.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472631#comment-16472631
 ] 

Billie Rinaldi commented on YARN-8141:
--

I agree, the user does not need to set 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. They should go through the 
configuration files API to get local resources mounted into a container.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8265) AM should retrieve new IP for restarted container

2018-05-11 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8265:
-
Attachment: YARN-8265.002.patch

> AM should retrieve new IP for restarted container
> -
>
> Key: YARN-8265
> URL: https://issues.apache.org/jira/browse/YARN-8265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8265.001.patch, YARN-8265.002.patch
>
>
> When a docker container is restarted, it gets a new IP, but the service AM 
> only retrieves one IP for a container and then cancels the container status 
> retriever. I suspect the issue would be solved by restarting the retriever 
> (if it has been canceled) when the onContainerRestart callback is received, 
> but we'll have to do some testing to make sure this works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472621#comment-16472621
 ] 

Shane Kumpf edited comment on YARN-8141 at 5/11/18 8:41 PM:


{quote}I agree that at this time, it's better to not remove 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS
{quote}
Fair enough. the original environment variable does make the intent explicit 
for relative source paths.

However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for 
'/etc/passwd' as originally described in this issue, this is what 
YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use 
case in allowing the user to provide 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up 
with a good use case for this.


was (Author: shaneku...@gmail.com):
{quote}I agree that at this time, it's better to not remove 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS
{quote}
Fair enough. the original environment variable does make the user intent 
explicit for relative source paths.

However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for 
'/etc/passwd' as originally described in this issue, this is what 
YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use 
case in allowing the user to provide 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up 
with a good use case for this.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472621#comment-16472621
 ] 

Shane Kumpf commented on YARN-8141:
---

{quote}I agree that at this time, it's better to not remove 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS
{quote}
Fair enough. the original environment variable does make the user intent 
explicit for relative source paths.

However, YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS won't work for 
'/etc/passwd' as originally described in this issue, this is what 
YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS is for. Can you help me understand the use 
case in allowing the user to provide 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS? I'm struggling to come up 
with a good use case for this.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-11 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472604#comment-16472604
 ] 

Suma Shivaprasad commented on YARN-8080:


Thanks [~csingh] Addressed review comments and merged with trunk

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-11 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.011.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472597#comment-16472597
 ] 

Jason Lowe commented on YARN-7654:
--

Thanks for updating the patch!

AbstractLauncher still has a setCommand method which is no longer necessary. 
DockerProviderService should call addCommand instead of setCommand. The only 
other place commands can be added to AbstractLauncher is in 
AbstractProviderService#buildLaunchCommand which DockerProviderService 
overrides.

The try-with-resources was added to only one of the writeCommandToTempFile 
methods (yes, oddly there are two), and the one that matters was not the one 
that was updated. As a result the OutputStreamWriter is not closed if anything 
throws, and the printWriter is not closed if some exceptions are thrown. My 
apologies for missing this earlier. The clue was only one of them is checking 
for DockerRunCommand instances.

My previous comment on putAll was misconstrued. I wasn't asking for 
DockerRunCommand to have a putAll method, rather that the addEnv method should 
be implemented in terms of userEnv.putAll. putAll is not a very useful method 
name for DockerRunCommand since it's not clear what "all" is. It's not clear 
that it's environment variables that are being added to the command. What I 
originally meant was for DockerRunCommand#addEnv to simply be this:
{code:java}
  public final void addEnv(Map environment) {
userEnv.putAll(environment);
  }
{code}
typo: "use_entry_pont" s/b "use_entry_point"

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472598#comment-16472598
 ] 

Haibo Chen commented on YARN-8248:
--

Thanks [~snemeth] for updating the patch. I have a few more comments/questions

1) The AMResourceRequests of an application is already verified in 
RMAppManager. validateAndCreateResourceRequest(). We don't need to check what 
if it is null, inside fair scheduler any more, IMO. Effectively, what I am 
suggesting is

removing
{code:java}
  if (rmApp == null || rmApp.getAMResourceRequests() == null) {
    LOG.debug("rmApp or rmApp.AMResourceRequests was null!");
  }
{code}
2) Isn't Resources.isAnyMajorResourceZero(DOMINANT_RESOURCE_CALCULATOR, 
queueMaxShare)), already included in 
!Resources.fitsIn(amResourceRequest.getCapability(),queueMaxShare)? That is, if 
the queue max resource is 0, 
Resources.fitsIn(amResourceRequest.getCapability(),queueMaxShare) would always 
return false. We'd also need to update the diagnostic message accordingly.

3) We don't need to check again in FairScheduler.allocate() because it is 
always called after the APP is accepted, which would imply the check already 
passed.

4) It is not clear to me how  testAppRejectedToQueueZeroCapacityOfResource() is 
different from  testSchedulingRejectedToQueueZeroCapacityOfResource().  The 
former case includes the latter, doesn't it? If so, I'd propose we get rid of 
testSchedulingRejectedToQueueZeroCapacityOfResource() and associated tests. 
There is one other case not covered in unit tests. What if max Resource of a 
queue is not zero, but the AM resource request is larger than the maxResource?

5) Some minor issues:

There is an unused import in FairScheduler.java;

Let's rename  processEvents()-> addApplication(),  processAttempAddedEvent() -> 
addAppAttempt();

Some debug messages tend to describe what the code does. Interpreting the debug 
log without the code aside can be hard.  A few suggestions:

LOG.debug("Assignment of container on node " + node+ " is zero!");  -> 
LOG.debug("No container is allocated on node" + node);

"Resource ask %s fits in available node resources %s, but the allocated 
container was null!" -> "Resource ask %s fits in available node resources %s, 
but no container was allocated"

LOG.debug("Assign container precheck was false on node: " + node); -> 
LOG.debug("Assign container precheck on node " + node + " failed" );

 

> Job hangs when a queue is specified and the maxResources of the queue cannot 
> satisfy the AM resource request
> 
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472591#comment-16472591
 ] 

genericqa commented on YARN-7654:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 11 new + 113 unchanged - 0 fixed = 124 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m  4s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
36s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black}

[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472587#comment-16472587
 ] 

genericqa commented on YARN-8248:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 246 unchanged - 0 fixed = 247 total (was 246) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8248 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923072/YARN-8248-006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8d5e00debded 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c1d64d6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20704/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20704/testReport/ |
| Max. process+thread count | 874 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourc

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472584#comment-16472584
 ] 

Chandni Singh commented on YARN-8141:
-

[~shaneku...@gmail.com] I don't see any issues with the approach you mentioned. 
However, I wanted to avoid making changes to Yarn core which seemed riskier.

I can create another ticket to address consolidating 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} and 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} for the next release 
and work on it.

Let me know if this sounds good to you?

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472579#comment-16472579
 ] 

Hudson commented on YARN-8243:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14177 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14177/])
YARN-8243. Flex down should remove instance with largest component (billie: rev 
ca612e353fc3e3766868ec0816de035e48b1f5b4)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceMaster.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/Component.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java


> Flex down should remove instance with largest component instance ID first
> -
>
> Key: YARN-8243
> URL: https://issues.apache.org/jira/browse/YARN-8243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8243.01.patch, YARN-8243.02.patch
>
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
> {
>   "name": "ping",
>   "number_of_containers": 2,
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "launch_command": "sleep 9000",
>   "placement_policy": {
> "constraints": [
>   {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
>   "ping"
> ]
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache

[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472573#comment-16472573
 ] 

Wangda Tan commented on YARN-8141:
--

Thanks [~csingh] / [~shaneku...@gmail.com] for the discussion.

I agree that at this time, it's better to not remove 
YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS. As proposed by Chandni, we 
should simply merge computed by service framework and whatever provided by user.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472564#comment-16472564
 ] 

Shane Kumpf edited comment on YARN-8141 at 5/11/18 7:59 PM:


{quote}{{AbstractLauncher}} in Yarn service while creating the 
{{LaunchContext}} doesn't know about the absolute path of the localized 
resource. {quote}

This is true, but that is the case with the current code path as well. 
Resolving the absolute path to the localized resource is handled on the runtime 
side.

I believe we could still consolidate by having 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute 
source paths (with the difference in behavior documented). If the source path 
is relative, it is assumed to be a localized resource and is then passed to 
validateMount which validates and returns the absolute path to that localized 
resource. If it an absolute path, the mount is added as-is. Do you see any 
issues with that approach? Does that still meet your original need 
[~leftnoteasy]?


was (Author: shaneku...@gmail.com):
{quote}{{AbstractLauncher}} in Yarn service while creating the 
{{LaunchContext}} doesn't know about the absolute path of the localized 
resource. \{quote}

This is true, but that is the case with the current code path as well. 
Resolving the absolute path to the localized resource is handled on the runtime 
side.

I believe we could still consolidate by having 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute 
source paths (with the difference in behavior documented). If the source path 
is relative, it is assumed to be a localized resource and is then passed to 
validateMount which validates and returns the absolute path to that localized 
resource. If it an absolute path, the mount is added as-is. Do you see any 
issues with that approach? Does that still meet your original need 
[~leftnoteasy]?

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472564#comment-16472564
 ] 

Shane Kumpf commented on YARN-8141:
---

{quote}{{AbstractLauncher}} in Yarn service while creating the 
{{LaunchContext}} doesn't know about the absolute path of the localized 
resource. \{quote}

This is true, but that is the case with the current code path as well. 
Resolving the absolute path to the localized resource is handled on the runtime 
side.

I believe we could still consolidate by having 
{{YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS}} support both relative and absolute 
source paths (with the difference in behavior documented). If the source path 
is relative, it is assumed to be a localized resource and is then passed to 
validateMount which validates and returns the absolute path to that localized 
resource. If it an absolute path, the mount is added as-is. Do you see any 
issues with that approach? Does that still meet your original need 
[~leftnoteasy]?

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request

2018-05-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8248:
-
Summary: Job hangs when a queue is specified and the maxResources of the 
queue cannot satisfy the AM resource request  (was: Job hangs when queue is 
specified and that queue has 0 maxResources of a resource)

> Job hangs when a queue is specified and the maxResources of the queue cannot 
> satisfy the AM resource request
> 
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472548#comment-16472548
 ] 

Eric Yang commented on YARN-8274:
-

[~ebadger] Sorry my mistake, I thought the report was for the second patch.  
With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like to 
get YARN-7654 committed before end of today.  YARN-7654 and YARN-8207 are 
probably left uncommitted for too long, and it is easy to make mistakes to 
rebase changes that includes logic for other patches including YARN-7973, 
YARN-8209, YARN-8261, YARN-8064.  I recommend to go through YARN-7654 to make 
sure the rebase was done correctly for those patches.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8244) TestContainerSchedulerQueuing.testStartMultipleContainers failed

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472532#comment-16472532
 ] 

Hudson commented on YARN-8244:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14176 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14176/])
YARN-8244. TestContainerSchedulerQueuing.testStartMultipleContainers (jlowe: 
rev dc912994a1bcb511dfda32a0649cef0c9bdc47d3)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java


>  TestContainerSchedulerQueuing.testStartMultipleContainers failed
> -
>
> Key: YARN-8244
> URL: https://issues.apache.org/jira/browse/YARN-8244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: YARN-8244.001.patch, YARN-8244.002.patch
>
>
> {code:java}
> testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing)
>   Time elapsed: 22.198 s  <<< FAILURE!
> java.lang.AssertionError: ContainerState is not correct (timedout)
>     at org.junit.Assert.fail(Assert.java:88)
>     at org.junit.Assert.assertTrue(Assert.java:41)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code}
> {code:java}
> 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch 
> (ContainerLaunch.java:call(329)) - Failed to launch container.
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.j

[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 maxResources of a resource

2018-05-11 Thread Szilard Nemeth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8248:
-
Summary: Job hangs when queue is specified and that queue has 0 
maxResources of a resource  (was: Job hangs when queue is specified and that 
queue has 0 capability of a resource)

> Job hangs when queue is specified and that queue has 0 maxResources of a 
> resource
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472520#comment-16472520
 ] 

Giovanni Matteo Fumarola commented on YARN-8275:


[~elgoiri] thanks for the comment. I am planning to code everything in Commons 
to be used from YARN and HDFS.

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472511#comment-16472511
 ] 

Eric Badger commented on YARN-8274:
---

[~eyang], I appreciate you getting on and reviewing this JIRA quickly. However, 
could you give a little bit of time before committing so that other people can 
take a look? At the very least, you should wait for genericqa to come back. 
When you committed the latest patch, genericqa had only run on the 1st patch.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472505#comment-16472505
 ] 

genericqa commented on YARN-8274:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
38m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 40s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923069/YARN-8274.002.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 533c7bbe3e6b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3a93af7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20703/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20703/testReport/ |
| Max. process+thread count | 289 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20703/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>

[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472502#comment-16472502
 ] 

Hudson commented on YARN-8268:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14175 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14175/])
YARN-8268. Fair scheduler: reservable queue is configured both as parent 
(haibochen: rev 1f10a360219c91ac13d31bdb5c8d302b1b45afc3)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestAllocationFileLoaderService.java


> Fair scheduler: reservable queue is configured both as parent and leaf queue
> 
>
> Key: YARN-8268
> URL: https://issues.apache.org/jira/browse/YARN-8268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8268.000.patch, YARN-8268.001.patch
>
>
> The following allocation file
> {code:java}
> 
> 
> 
>   someuser 
>   
> someuser 
>   
>   
> 
> 
> someuser 
>   
> 
> drf
> 
> {code}
> is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, 
> root.dedicated]}} (AllocationConfiguration.configuredQueues).
> The root.dedicated should only appear as a PARENT queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8244) TestContainerSchedulerQueuing.testStartMultipleContainers failed

2018-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472493#comment-16472493
 ] 

Jason Lowe commented on YARN-8244:
--

Thanks for updating the patch!

+1 lgtm.  Committing this.


>  TestContainerSchedulerQueuing.testStartMultipleContainers failed
> -
>
> Key: YARN-8244
> URL: https://issues.apache.org/jira/browse/YARN-8244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8244.001.patch, YARN-8244.002.patch
>
>
> {code:java}
> testStartMultipleContainers(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing)
>   Time elapsed: 22.198 s  <<< FAILURE!
> java.lang.AssertionError: ContainerState is not correct (timedout)
>     at org.junit.Assert.fail(Assert.java:88)
>     at org.junit.Assert.assertTrue(Assert.java:41)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:344)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:309)
>     at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testStartMultipleContainers(TestContainerSchedulerQueuing.java:256)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>     at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>     at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>     at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>     at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>     at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413){code}
> {code:java}
> 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch 
> (ContainerLaunch.java:call(329)) - Failed to launch container.
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101

[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472477#comment-16472477
 ] 

Hudson commented on YARN-8274:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14174 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14174/])
YARN-8274. Fixed a bug on docker start command.Contributed (eyang: 
rev 8f7912e0fee5de608ce8824fa5bd81b01b9a7c38)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc


> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472476#comment-16472476
 ] 

Íñigo Goiri commented on YARN-8275:
---

Even though the main use should be in YARN for now, we should do the changes in 
Commons and eventually plug this in HDFS too.

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472460#comment-16472460
 ] 

Giovanni Matteo Fumarola commented on YARN-8275:


The attached file [^WinUtils-Functions.pdf] shows the current usage (inputs and 
outputs) of all the WinUtils functions. We should design a JNI interface 
aligned with it.

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8275:
---
Attachment: WinUtils-Functions.pdf

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472450#comment-16472450
 ] 

Chandni Singh edited comment on YARN-8141 at 5/11/18 6:36 PM:
--

[~shaneku...@gmail.com] Thanks for pointing out the issue.
{quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} 
are all relative paths. These need to be specified as absolute paths in the 
final mount that is added to the {{docker run}} command
{quote}
I think it may be better not to get rid of 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because

{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} 
doesn't know about the absolute path of the localized resource. 

I can put a simple fix to merge user specified 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed 
by yarn service for the issue reported by [~leftnoteasy]

Let me know your thoughts?

 


was (Author: csingh):
[~shaneku...@gmail.com] Thanks for pointing out the issue.
{quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} 
are all relative paths. These need to be specified as absolute paths in the 
final mount that is added to the {{docker run}} command
{quote}
I think it may be better not to get rid of 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because

{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} 
doesn't know about the absolute path of the localized resource. 

I can put a simple fix to merge user specified 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed 
by yarn service.

Let me know your thoughts?

 

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472452#comment-16472452
 ] 

Haibo Chen commented on YARN-8268:
--

Thanks [~grepas] for the contribution, [~wilfreds] for the additional review. I 
have checked in the patch to trunk

> Fair scheduler: reservable queue is configured both as parent and leaf queue
> 
>
> Key: YARN-8268
> URL: https://issues.apache.org/jira/browse/YARN-8268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8268.000.patch, YARN-8268.001.patch
>
>
> The following allocation file
> {code:java}
> 
> 
> 
>   someuser 
>   
> someuser 
>   
>   
> 
> 
> someuser 
>   
> 
> drf
> 
> {code}
> is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, 
> root.dedicated]}} (AllocationConfiguration.configuredQueues).
> The root.dedicated should only appear as a PARENT queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-11 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472450#comment-16472450
 ] 

Chandni Singh commented on YARN-8141:
-

[~shaneku...@gmail.com] Thanks for pointing out the issue.
{quote}As a result, the "source" of the mounts added in {{AbstractLauncher}} 
are all relative paths. These need to be specified as absolute paths in the 
final mount that is added to the {{docker run}} command
{quote}
I think it may be better not to get rid of 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} at this point because

{{AbstractLauncher}} in Yarn service while creating the {{LaunchContext}} 
doesn't know about the absolute path of the localized resource. 

I can put a simple fix to merge user specified 
{{YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS}} with the once computed 
by yarn service.

Let me know your thoughts?

 

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8272) Several items are missing from Hadoop 3.1.0 documentation

2018-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-8272.
--
Resolution: Duplicate

Closing as dup of HADOOP-15374

> Several items are missing from Hadoop 3.1.0 documentation
> -
>
> Key: YARN-8272
> URL: https://issues.apache.org/jira/browse/YARN-8272
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Wangda Tan
>Priority: Blocker
>
> From what I can see there're several missing items like GPU / FPGA: 
> http://hadoop.apache.org/docs/current/
> We should add them to hadoop-project/src/site/site.xml in the next release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8272) Several items are missing from Hadoop 3.1.0 documentation

2018-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472447#comment-16472447
 ] 

Wangda Tan commented on YARN-8272:
--

Oh sweet! Thanks [~ajisakaa] and [~tasanuma0829]!

> Several items are missing from Hadoop 3.1.0 documentation
> -
>
> Key: YARN-8272
> URL: https://issues.apache.org/jira/browse/YARN-8272
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Wangda Tan
>Priority: Blocker
>
> From what I can see there're several missing items like GPU / FPGA: 
> http://hadoop.apache.org/docs/current/
> We should add them to hadoop-project/src/site/site.xml in the next release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8258) [UI2] New UI webappcontext should inherit all filters from default context

2018-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8258:
-
Reporter: Sumana Sathish  (was: Sunil G)

> [UI2] New UI webappcontext should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8258.001.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8280) [UI2] GPU is not present in the drop down box under 'Nodes Heatmap'

2018-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8280:
-
Reporter: Sumana Sathish  (was: Gergely Novák)

> [UI2] GPU is not present in the drop down box under 'Nodes Heatmap'
> ---
>
> Key: YARN-8280
> URL: https://issues.apache.org/jira/browse/YARN-8280
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-8280.001.patch
>
>
> 1. Click on Nodes Heatmap Chart under Nodes Tab.
> 2. No option to select GPU is available under the drop down menu.
> Discovered by [~ssath...@hortonworks.com].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472423#comment-16472423
 ] 

Haibo Chen commented on YARN-8268:
--

+1. Checking this in shortly.

> Fair scheduler: reservable queue is configured both as parent and leaf queue
> 
>
> Key: YARN-8268
> URL: https://issues.apache.org/jira/browse/YARN-8268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8268.000.patch, YARN-8268.001.patch
>
>
> The following allocation file
> {code:java}
> 
> 
> 
>   someuser 
>   
> someuser 
>   
>   
> 
> 
> someuser 
>   
> 
> drf
> 
> {code}
> is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, 
> root.dedicated]}} (AllocationConfiguration.configuredQueues).
> The root.dedicated should only appear as a PARENT queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8274:

Fix Version/s: 3.1.1
   3.2.0

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472408#comment-16472408
 ] 

genericqa commented on YARN-8130:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 19s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8130 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923057/YARN-8130.03.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eaf463b014d7 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d50c4d7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/20702/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20702/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hado

[jira] [Commented] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472407#comment-16472407
 ] 

Eric Yang commented on YARN-7654:
-

[~jlowe] Patch 23 includes all your suggestions.

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-11 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7654:

Attachment: YARN-7654.023.patch

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch, YARN-7654.022.patch, YARN-7654.023.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-11 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472399#comment-16472399
 ] 

Szilard Nemeth commented on YARN-8248:
--

Thanks [~haibochen] for your answers.

It makes sense now why you wanted to remove changes from Resources.

Also changed the scope of the debug logs and just kept those that are "edge" 
cases and come up more rarely.

About the 3rd point: I checked the writeLock's scope but it cannod be reduced 
since the following 2 lines should be called when lock was called on the 
writeLock: 
{code:java}
RMApp rmApp = rmContext.getRMApps().get(applicationId); 
FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
{code}
 

Please check my updated patch!

Thanks!

> Job hangs when queue is specified and that queue has 0 capability of a 
> resource
> ---
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8281) [Java] Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8281:
---
Description: This JIRA tracks the design/implementation of the Java layer 
for the JNI interface to interact with Windows.

> [Java] Create a JNI interface to interact with Windows
> --
>
> Key: YARN-8281
> URL: https://issues.apache.org/jira/browse/YARN-8281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
>
> This JIRA tracks the design/implementation of the Java layer for the JNI 
> interface to interact with Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-11 Thread Szilard Nemeth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8248:
-
Attachment: YARN-8248-006.patch

> Job hangs when queue is specified and that queue has 0 capability of a 
> resource
> ---
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8281) [Java] Create a JNI interface to interact with Windows

2018-05-11 Thread Giovanni Matteo Fumarola (JIRA)
Giovanni Matteo Fumarola created YARN-8281:
--

 Summary: [Java] Create a JNI interface to interact with Windows
 Key: YARN-8281
 URL: https://issues.apache.org/jira/browse/YARN-8281
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Giovanni Matteo Fumarola
Assignee: Giovanni Matteo Fumarola






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472392#comment-16472392
 ] 

Eric Yang commented on YARN-8274:
-

Sorry the code was missed during refactoring.

+1 The change looks good.  

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472387#comment-16472387
 ] 

Haibo Chen commented on YARN-8130:
--

+1 pending jenkins.

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472371#comment-16472371
 ] 

genericqa commented on YARN-8274:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 18s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923054/YARN-8274.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 47161a3a09cf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a922b9c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/20701/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20701/testReport/ |
| Max. process+thread count | 335 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20701/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with 

[jira] [Updated] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8274:
-
Attachment: YARN-8274.002.patch

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-8274.001.patch, YARN-8274.002.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472345#comment-16472345
 ] 

Jason Lowe commented on YARN-8274:
--

I think I found the bug with the container relaunch and the unrecognized 
command.  get_docker_start_command is not adding the docker binary to the first 
argument, so argv[0] == "start" rather than "/usr/bin/docker" and docker is 
looking at argv[1] to determine the command name, and that's a container ID 
rather than a docker command.  I'll update the patch shortly.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-8274.001.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8274) Docker command error during container relaunch

2018-05-11 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472343#comment-16472343
 ] 

Billie Rinaldi commented on YARN-8274:
--

Okay, it does look like my container is properly relaunched when YARN-8207 
isn't applied.

> Docker command error during container relaunch
> --
>
> Key: YARN-8274
> URL: https://issues.apache.org/jira/browse/YARN-8274
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Billie Rinaldi
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-8274.001.patch
>
>
> I initiated container relaunch with a "sleep 60; exit 1" launch command and 
> saw a "not a docker command" error on relaunch. Haven't figured out why this 
> is happening, but it seems like it has been introduced recently to 
> trunk/branch-3.1. cc [~shaneku...@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Relaunch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
> container_1525897486447_0003_01_02
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 7
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception 
> message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell error 
> output: docker: 'container_1525897486447_0003_01_02' is not a docker 
> command.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472335#comment-16472335
 ] 

Haibo Chen commented on YARN-8248:
--

Thanks [~snemeth] for the clarification.

I think in general it's okay for fix some minor code/style cleanup around the 
same area while working on a patch. It becomes however, overhead if the minor 
changes cause confusion, or is to remote to your core change. Folks reading the 
commit history would also have questions, without digging into the Jira 
discussion. Hence, I'd prefer, in this case, to leave Resources as is. If there 
are many cleanup issues you can find, a separate patch is justifiable.

I agree with you debug logs help debugging, especially on the unhappy/abnormal 
code  path. However, if we add too much logging on the happy hot code path, the 
logs will be flooded.

> Job hangs when queue is specified and that queue has 0 capability of a 
> resource
> ---
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >