[jira] [Commented] (YARN-8239) [UI2] Clicking on Node Manager UI under AM container info / App Attempt page goes to old RM UI

2018-05-02 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461948#comment-16461948
 ] 

Rohith Sharma K S commented on YARN-8239:
-

I have changed reporter to Sumana. Thanks Sumana..

> [UI2] Clicking on Node Manager UI under AM container info / App Attempt page 
> goes to old RM UI
> --
>
> Key: YARN-8239
> URL: https://issues.apache.org/jira/browse/YARN-8239
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8239.001.patch
>
>
> "Grid View" and in Containers page of both "Grid" and "Graph" views links to 
> old ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8239) [UI2] Clicking on Node Manager UI under AM container info / App Attempt page goes to old RM UI

2018-05-02 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8239:

Reporter: Sumana Sathish  (was: Sunil G)

> [UI2] Clicking on Node Manager UI under AM container info / App Attempt page 
> goes to old RM UI
> --
>
> Key: YARN-8239
> URL: https://issues.apache.org/jira/browse/YARN-8239
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Sumana Sathish
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8239.001.patch
>
>
> "Grid View" and in Containers page of both "Grid" and "Graph" views links to 
> old ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8239) [UI2] Clicking on Node Manager UI under AM container info / App Attempt page goes to old RM UI

2018-05-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461947#comment-16461947
 ] 

Sunil G commented on YARN-8239:
---

Thanks [~ssath...@hortonworks.com] for reporting this. [~rohithsharma] could u 
pls help to review.

> [UI2] Clicking on Node Manager UI under AM container info / App Attempt page 
> goes to old RM UI
> --
>
> Key: YARN-8239
> URL: https://issues.apache.org/jira/browse/YARN-8239
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-8239.001.patch
>
>
> "Grid View" and in Containers page of both "Grid" and "Graph" views links to 
> old ui.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8242:
-
Target Version/s: 3.1.1
Priority: Blocker  (was: Major)

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Kanwaljeet Sachdev
>Assignee: Kanwaljeet Sachdev
>Priority: Blocker
> Attachments: YARN-8242.001.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-02 Thread Kanwaljeet Sachdev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljeet Sachdev updated YARN-8242:
-
Attachment: YARN-8242.001.patch

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Kanwaljeet Sachdev
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Attachments: YARN-8242.001.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-02 Thread Kanwaljeet Sachdev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanwaljeet Sachdev reassigned YARN-8242:


Assignee: Kanwaljeet Sachdev

> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Kanwaljeet Sachdev
>Assignee: Kanwaljeet Sachdev
>Priority: Major
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24517) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
>  (YarnServiceProtos.java:24464) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24568) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
>  (YarnServiceProtos.java:24563) at 
> com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
> at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
> com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
>  (YarnServiceProtos.java:24739) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
>  (NMLeveldbStateStoreService.java:217) at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
>  (NMLeveldbStateStoreService.java:170) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
>  (ContainerManagerImpl.java:253) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
>  (ContainerManagerImpl.java:237) at 
> org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
> org.apache.hadoop.service.CompositeService.serviceInit 
> (CompositeService.java:107) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
> (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
> (AbstractService.java:163) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
> (NodeManager.java:474) at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
> (NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-05-02 Thread Kanwaljeet Sachdev (JIRA)
Kanwaljeet Sachdev created YARN-8242:


 Summary: YARN NM: OOM error while reading back the state store on 
recovery
 Key: YARN-8242
 URL: https://issues.apache.org/jira/browse/YARN-8242
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.2.0
Reporter: Kanwaljeet Sachdev


On startup the NM reads its state store and builds a list of application in the 
state store to process. If the number of applications in the state store is 
large and have a lot of "state" connected to it the NM can run OOM and never 
get to the point that it can start processing the recovery.
Since it never starts the recovery there is no way for the NM to ever pass this 
point. It will require a change in heap size to get the NM started.

 

Following is the stack trace
{code:java}
at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
(YarnProtos.java:47069) at 
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
(YarnProtos.java:47014) at 
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom 
(YarnProtos.java:47102) at 
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom 
(YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
(CodedInputStream.java:309) at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
(YarnProtos.java:41016) at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
(YarnProtos.java:40942) at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
 (YarnProtos.java:41080) at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
 (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
(CodedInputStream.java:309) at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
 (YarnServiceProtos.java:24517) at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.
 (YarnServiceProtos.java:24464) at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
 (YarnServiceProtos.java:24568) at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
 (YarnServiceProtos.java:24563) at 
com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) 
at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at 
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at 
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at 
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at 
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
 (YarnServiceProtos.java:24739) at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
 (NMLeveldbStateStoreService.java:217) at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
 (NMLeveldbStateStoreService.java:170) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
 (ContainerManagerImpl.java:253) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
 (ContainerManagerImpl.java:237) at 
org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at 
org.apache.hadoop.service.CompositeService.serviceInit 
(CompositeService.java:107) at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit 
(NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init 
(AbstractService.java:163) at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager 
(NodeManager.java:474) at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main 
(NodeManager.java:521){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-02 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7933:

Attachment: YARN-7933.04.patch

> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461837#comment-16461837
 ] 

Naganarasimha G R commented on YARN-7892:
-

[~sunilg] & [~bibinchundatt],

Uploaded a patch fixing the jenkins reported issue, some of the test failures 
are not related to the patch modifications and asf license issue too is 
unrelated

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7892) Revisit NodeAttribute class structure

2018-05-02 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-7892:

Attachment: YARN-7892-YARN-3409.005.patch

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8151) Yarn RM Epoch should wrap around

2018-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461802#comment-16461802
 ] 

Hudson commented on YARN-8151:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14116 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14116/])
YARN-8151. Yarn RM Epoch should wrap around. Contributed by Young Chen. 
(inigoiri: rev e6a80e476d4348a4373e6dd5792d70edff16516f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestLeveldbRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


> Yarn RM Epoch should wrap around
> 
>
> Key: YARN-8151
> URL: https://issues.apache.org/jira/browse/YARN-8151
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8151.01.patch, YARN-8151.01.patch, 
> YARN-8151.02.patch, YARN-8151.03.patch, YARN-8151.04.patch, YARN-8151.05.patch
>
>
> Right now RM Epoch values in sub clusters are seeded in different ranges: 0, 
> 1000, 2000, etc. If one RM restarts enough its epoch can increment until it 
> clashes with a neighboring sub cluster. E.g. 999 -> 1000. To fix this, we 
> introduce a configurable range by which the epoch generation is bound.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461797#comment-16461797
 ] 

Miklos Szegedi commented on YARN-7715:
--

[~asuresh], [~haibo.chen] I attached a patch of my proposal.

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7715.000.patch
>
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-02 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7715:
-
Attachment: YARN-7715.000.patch

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-7715.000.patch
>
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8194) Exception when reinitializing a container using LinuxContainerExecutor

2018-05-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461787#comment-16461787
 ] 

Hudson commented on YARN-8194:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14115 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14115/])
YARN-8194.  Fixed reinitialization error for LinuxContainerExecutor. 
(eyang: rev f4d280f02b557885cd5e5cf36abc36eb579ccfb4)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


> Exception when reinitializing a container using LinuxContainerExecutor
> --
>
> Key: YARN-8194
> URL: https://issues.apache.org/jira/browse/YARN-8194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: YARN-8194.001.patch
>
>
> When a component instance is upgraded and the container executor is set to 
> {{org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor}}, then 
> the following exception is seen in the nodemanager:
> {code}
> Writing to cgroup task files...
> Creating local dirs...
> Can't open 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
>  for output - File exists
> Getting exit code file...
> Creating script paths...
> Full command array for failed execution:
> [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, 
> application_1524242413029_0001, container_1524242413029_0001_01_02, 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.tokens,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.pid,
>  /tmp/hadoop-yarn/nm-local-dir, 
> /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none]
> 2018-04-20 16:50:16,641 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Launch container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=33: Could not create copy file 3 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
> Could not create local files and directories
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=33: Could not create copy file 3 
> 

[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-02 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461771#comment-16461771
 ] 

genericqa commented on YARN-8206:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 42m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8206 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12921647/YARN-8206.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ae4c1cd835bb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6b63a0a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20576/testReport/ |
| Max. process+thread count | 340 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20576/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Sending a kill does not immediately kill docker containers
> 

[jira] [Commented] (YARN-8163) Add support for Node Labels in opportunistic scheduling.

2018-05-02 Thread Abhishek Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461772#comment-16461772
 ] 

Abhishek Modi commented on YARN-8163:
-

Thanks [~giovanni.fumarola] for review. I will address the review comments and 
submit an updated patch.

> Add support for Node Labels in opportunistic scheduling.
> 
>
> Key: YARN-8163
> URL: https://issues.apache.org/jira/browse/YARN-8163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8163.002.patch, YARN-8163.patch
>
>
> Currently, the Opportunistic Scheduler doesn't honor node labels constraints 
> and schedule containers based on locality and load constraints. This Jira is 
> to add support in opportunistic scheduling to honor node labels in resource 
> requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8151) Yarn RM Epoch should wrap around

2018-05-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461768#comment-16461768
 ] 

Íñigo Goiri commented on YARN-8151:
---

The failed unit test seems unrelated and this feature is disabled by default so 
the behavior should stay the same.
Committing to trunk.
Thanks [~youchen] for the patch and [~giovanni.fumarola] for the review.

> Yarn RM Epoch should wrap around
> 
>
> Key: YARN-8151
> URL: https://issues.apache.org/jira/browse/YARN-8151
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8151.01.patch, YARN-8151.01.patch, 
> YARN-8151.02.patch, YARN-8151.03.patch, YARN-8151.04.patch, YARN-8151.05.patch
>
>
> Right now RM Epoch values in sub clusters are seeded in different ranges: 0, 
> 1000, 2000, etc. If one RM restarts enough its epoch can increment until it 
> clashes with a neighboring sub cluster. E.g. 999 -> 1000. To fix this, we 
> introduce a configurable range by which the epoch generation is bound.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8151) Yarn RM Epoch should wrap around

2018-05-02 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461765#comment-16461765
 ] 

Giovanni Matteo Fumarola commented on YARN-8151:


+1 from my side.

> Yarn RM Epoch should wrap around
> 
>
> Key: YARN-8151
> URL: https://issues.apache.org/jira/browse/YARN-8151
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8151.01.patch, YARN-8151.01.patch, 
> YARN-8151.02.patch, YARN-8151.03.patch, YARN-8151.04.patch, YARN-8151.05.patch
>
>
> Right now RM Epoch values in sub clusters are seeded in different ranges: 0, 
> 1000, 2000, etc. If one RM restarts enough its epoch can increment until it 
> clashes with a neighboring sub cluster. E.g. 999 -> 1000. To fix this, we 
> introduce a configurable range by which the epoch generation is bound.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8194) Exception when reinitializing a container using LinuxContainerExecutor

2018-05-02 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8194:

Fix Version/s: (was: 3.1.1)
   3.2.0

> Exception when reinitializing a container using LinuxContainerExecutor
> --
>
> Key: YARN-8194
> URL: https://issues.apache.org/jira/browse/YARN-8194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: YARN-8194.001.patch
>
>
> When a component instance is upgraded and the container executor is set to 
> {{org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor}}, then 
> the following exception is seen in the nodemanager:
> {code}
> Writing to cgroup task files...
> Creating local dirs...
> Can't open 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
>  for output - File exists
> Getting exit code file...
> Creating script paths...
> Full command array for failed execution:
> [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, 
> application_1524242413029_0001, container_1524242413029_0001_01_02, 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.tokens,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.pid,
>  /tmp/hadoop-yarn/nm-local-dir, 
> /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none]
> 2018-04-20 16:50:16,641 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Launch container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=33: Could not create copy file 3 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
> Could not create local files and directories
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=33: Could not create copy file 3 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
> Could not create local files and directories
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 11 more
> 2018-04-20 16:50:16,642 WARN 
> 

[jira] [Commented] (YARN-8194) Exception when reinitializing a container using LinuxContainerExecutor

2018-05-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461759#comment-16461759
 ] 

Eric Yang commented on YARN-8194:
-

[~csingh] There is no container relaunch commited to branch-3.1.  I updated the 
target version to 3.2 only.

+1 for commit.  The patch works as intended.

> Exception when reinitializing a container using LinuxContainerExecutor
> --
>
> Key: YARN-8194
> URL: https://issues.apache.org/jira/browse/YARN-8194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0
>
> Attachments: YARN-8194.001.patch
>
>
> When a component instance is upgraded and the container executor is set to 
> {{org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor}}, then 
> the following exception is seen in the nodemanager:
> {code}
> Writing to cgroup task files...
> Creating local dirs...
> Can't open 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
>  for output - File exists
> Getting exit code file...
> Creating script paths...
> Full command array for failed execution:
> [/usr/local/hadoop-3.2.0-SNAPSHOT/bin/container-executor, hbase, hbase, 1, 
> application_1524242413029_0001, container_1524242413029_0001_01_02, 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.tokens,
>  
> /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1524242413029_0001/container_1524242413029_0001_01_02/container_1524242413029_0001_01_02.pid,
>  /tmp/hadoop-yarn/nm-local-dir, 
> /usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs, cgroups=none]
> 2018-04-20 16:50:16,641 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Launch container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=33: Could not create copy file 3 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
> Could not create local files and directories
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:118)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:477)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:492)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=33: Could not create copy file 3 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1524242413029_0001/container_1524242413029_0001_01_02/launch_container.sh
> Could not create local files and directories
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> 

[jira] [Commented] (YARN-8163) Add support for Node Labels in opportunistic scheduling.

2018-05-02 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461758#comment-16461758
 ] 

Giovanni Matteo Fumarola commented on YARN-8163:


Thanks [~abmodi] .  Thanks for the patch.

Few comments:
 * _RemoteNodePBImpl#getNodePartition_ should return null and not an empty 
string.
 * nodePartition in _yarn_server_common_service_protos_ should be 
node_partition.
 * In _OpportunisticContainerAllocatorAMService#allocate_ 
_partitionedAsks.getOpportunistic()_ can be null. Please add null pointer check.
 * Please revert the deletion on the new line in 
_OpportunisticContainerAllocatorAMService#handle_.
 * Add comments on the jUnit.

> Add support for Node Labels in opportunistic scheduling.
> 
>
> Key: YARN-8163
> URL: https://issues.apache.org/jira/browse/YARN-8163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8163.002.patch, YARN-8163.patch
>
>
> Currently, the Opportunistic Scheduler doesn't honor node labels constraints 
> and schedule containers based on locality and load constraints. This Jira is 
> to add support in opportunistic scheduling to honor node labels in resource 
> requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461751#comment-16461751
 ] 

Eric Yang commented on YARN-8206:
-

>From today's meeting, signal handling is ran as the user who submit the job.  
>This is likely to break with YARN-7221 where privileged docker container can 
>run as a different user than the user who submit the job.  Some work is 
>required to handle this corner case.

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461741#comment-16461741
 ] 

Miklos Szegedi commented on YARN-7715:
--

I am working on a preliminary patch to discuss. Do you think we should reuse 
reacquireContainer for the apply logic or create a separate apply?

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7715) Update CPU and Memory cgroups params on container update as well.

2018-05-02 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi reassigned YARN-7715:


Assignee: Miklos Szegedi

> Update CPU and Memory cgroups params on container update as well.
> -
>
> Key: YARN-7715
> URL: https://issues.apache.org/jira/browse/YARN-7715
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>Priority: Major
>
> In YARN-6673 and YARN-6674, the cgroups resource handlers update the cgroups 
> params for the containers, based on opportunistic or guaranteed, in the 
> *preStart* method.
> Now that YARN-5085 is in, Container executionType (as well as the cpu, memory 
> and any other resources) can be updated after the container has started. This 
> means we need the ability to change cgroups params after container start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7729) Add support for setting the PID namespace mode

2018-05-02 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7729:

Fix Version/s: 3.1.0

> Add support for setting the PID namespace mode
> --
>
> Key: YARN-7729
> URL: https://issues.apache.org/jira/browse/YARN-7729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7729.001.patch, YARN-7729.002.patch, 
> YARN-7729.003.patch
>
>
> Docker has support for allowing containers to share the PID namespace with 
> the host or other containers via the {{docker run --pid}} flag.
> There are a number of use cases where this is desirable:
> * Monitoring tools running in containers that need access to the host level 
> PIDs.
> * Debug containers that can attach to another container to run strace, gdb, 
> etc.
> * Testing Docker on YARN in a container, where the docker socket is bind 
> mounted.
> Enabling this feature should be considered privileged as it exposes host 
> details inside the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8241) MRAppMaster fails when using UID:GID pair within docker container

2018-05-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461706#comment-16461706
 ] 

Eric Yang edited comment on YARN-8241 at 5/2/18 10:53 PM:
--

There are two possible solutions for this problem:

Option 1) Automatically detect existence of sssd or nscd socket, and bind-mount 
the socket into container.

*Pros*
 Simple to implement. [Online 
tutorial|https://jhrozek.wordpress.com/2015/03/31/authenticating-a-docker-container-against-hosts-unix-accounts/]
 covers how to do this.
 *Cons*
 The image must be built with sssd client or nscd libraries for pam to work in 
addition to Kerberos setup.

Option 2) Fix UserGroupInformation logic to map to Kerberos subject principal 
name instead of Unix Principal name. This will allow high level java code to 
work without username and group name.

*Pros*
 Less dependencies. Krb5.conf and keytab are only requirement for this to work.
 *Cons*
 Works for Hadoop related java code, does not work with non-Hadoop workload.


was (Author: eyang):
There are two possible solutions for this problem:

Option 1) Automatically detect existence of sssd or nscd socket, and bind-mount 
the socket into container.

*Pros*
 Simple to implement. [Online 
tutorial|https://jhrozek.wordpress.com/2015/03/31/authenticating-a-docker-container-against-hosts-unix-accounts/]
 covers how to do this.
*Cons*
 The image must be built with sssd client or nscd libraries for pam to work in 
addition to Kerberos setup.

Option 2) Fix UserGroupInformation logic to map to Kerberos subject principal 
name instead of Unix Principal name. This will allow high level java code to 
work without username and group name.

*Pros*
 Less dependencies. Krb5.conf and keytab are only requirement for this ti work.
 *Cons*
 Works for Hadoop related java code, does not work with non-Hadoop workload.

> MRAppMaster fails when using UID:GID pair within docker container
> -
>
> Key: YARN-8241
> URL: https://issues.apache.org/jira/browse/YARN-8241
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> As mentioned in [this 
> comment|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16063931=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16063931],
>  the MRAppMaster fails for docker containers if there is no additional user 
> lookup strategy (e.g. bind-mounting /var/run/nscd or /etc/passwd). We need a 
> better solution so that users can still run even if they are not known inside 
> of the container by name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8241) MRAppMaster fails when using UID:GID pair within docker container

2018-05-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461706#comment-16461706
 ] 

Eric Yang commented on YARN-8241:
-

There are two possible solutions for this problem:

Option 1) Automatically detect existence of sssd or nscd socket, and bind-mount 
the socket into container.

*Pros*
 Simple to implement. [Online 
tutorial|https://jhrozek.wordpress.com/2015/03/31/authenticating-a-docker-container-against-hosts-unix-accounts/]
 covers how to do this.
*Cons*
 The image must be built with sssd client or nscd libraries for pam to work in 
addition to Kerberos setup.

Option 2) Fix UserGroupInformation logic to map to Kerberos subject principal 
name instead of Unix Principal name. This will allow high level java code to 
work without username and group name.

*Pros*
 Less dependencies. Krb5.conf and keytab are only requirement for this ti work.
 *Cons*
 Works for Hadoop related java code, does not work with non-Hadoop workload.

> MRAppMaster fails when using UID:GID pair within docker container
> -
>
> Key: YARN-8241
> URL: https://issues.apache.org/jira/browse/YARN-8241
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> As mentioned in [this 
> comment|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16063931=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16063931],
>  the MRAppMaster fails for docker containers if there is no additional user 
> lookup strategy (e.g. bind-mounting /var/run/nscd or /etc/passwd). We need a 
> better solution so that users can still run even if they are not known inside 
> of the container by name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8209) NPE in DeletionService

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8209:
--
Labels: Docker  (was: )

> NPE in DeletionService
> --
>
> Key: YARN-8209
> URL: https://issues.apache.org/jira/browse/YARN-8209
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8209.001.patch, YARN-8209.002.patch, 
> YARN-8209.003.patch, YARN-8209.004.patch, YARN-8209.005.patch
>
>
> {code:java}
> 2018-04-25 23:38:41,039 WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in 
> thread DeletionService #1:
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerClient.writeCommandToTempFile(DockerClient.java:109)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:85)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:192)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:128)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:935)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3611:
--
Labels: Docker  (was: )

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7814) Remove automatic mounting of the cgroups root directory into Docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7814:
--
Labels: Docker  (was: )

> Remove automatic mounting of the cgroups root directory into Docker containers
> --
>
> Key: YARN-7814
> URL: https://issues.apache.org/jira/browse/YARN-7814
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7814.001.patch, YARN-7814.002.patch
>
>
> Currently, all Docker containers launched by {{DockerLinuxContainerRuntime}} 
> get /sys/fs/cgroup automatically mounted. Now that user supplied mounts 
> (YARN-5534) are in, containers that require this mount can request it (with a 
> properly configured mount whitelist).
> I propose we remove the automatic mounting of /sys/fs/cgroup into Docker 
> containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7233) Make the cgroup mount into Docker containers configurable

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7233:
--
Labels: Docker  (was: )

> Make the cgroup mount into Docker containers configurable
> -
>
> Key: YARN-7233
> URL: https://issues.apache.org/jira/browse/YARN-7233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Miklos Szegedi
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> Not all containers need this mount. There should be an option to opt for 
> lxcfs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6988) container-executor fails for docker when command length > 4096 B

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6988:
--
Labels: Docker  (was: )

> container-executor fails for docker when command length > 4096 B
> 
>
> Key: YARN-6988
> URL: https://issues.apache.org/jira/browse/YARN-6988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6988-branch-2.002.patch, 
> YARN-6988-branch-2.8.002.patch, YARN-6988.001.patch, YARN-6988.002.patch
>
>
> {{run_docker}} and {{launch_docker_container_as_user}} allocate their command 
> arrays using EXECUTOR_PATH_MAX, which is hardcoded to 4096 in 
> configuration.h. Because of this, the full docker command can only be 4096 
> characters. If it is longer, it will be truncated and the command will fail 
> with a parsing error. Because of the bind-mounting of volumes, the arguments 
> to the docker command can quickly get large. For example, I passed the 4096 
> limit with an 11 disk node. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5258:
--
Labels: Docker oct16-easy  (was: oct16-easy)

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: Docker, oct16-easy
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch, YARN-5258.005.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3854) Add localization support for docker images

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3854:
--
Labels: Docker  (was: )

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4255) container-executor does not clean up docker operation command files.

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4255:
--
Labels: Docker  (was: )

> container-executor does not clean up docker operation command files. 
> -
>
> Key: YARN-4255
> URL: https://issues.apache.org/jira/browse/YARN-4255
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Minor
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4255.001.patch
>
>
> container-executor leaves behind docker command files that are used to run 
> docker commands. These need to be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7232) Consider /sys/fs/cgroup as the default CGroup mount path

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7232:
--
Labels: Docker  (was: )

> Consider /sys/fs/cgroup as the default CGroup mount path
> 
>
> Key: YARN-7232
> URL: https://issues.apache.org/jira/browse/YARN-7232
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Miklos Szegedi
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> YARN-6968 fixed the findbugs issue due to the hard coded /sys/fs/cgroups 
> mount path for Docker containers. It removed the default value on the other 
> hand. This jira is a followup to make sure the admin does not have to set the 
> value every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5366) Improve handling of the Docker container life cycle

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5366:
--
Labels: Docker oct16-medium  (was: oct16-medium)

> Improve handling of the Docker container life cycle
> ---
>
> Key: YARN-5366
> URL: https://issues.apache.org/jira/browse/YARN-5366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker, oct16-medium
> Fix For: 3.1.0
>
> Attachments: YARN-5366.001.patch, YARN-5366.002.patch, 
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, 
> YARN-5366.006.patch, YARN-5366.007.patch, YARN-5366.008.patch, 
> YARN-5366.009.patch, YARN-5366.010.patch
>
>
> There are several paths that need to be improved with regard to the Docker 
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set 
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for 
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of 
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container 
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped 
> before the PID can be retrieved. (YARN-6305)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7914) Fix exit code handling for short lived Docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7914:
--
Labels: Docker  (was: )

> Fix exit code handling for short lived Docker containers
> 
>
> Key: YARN-7914
> URL: https://issues.apache.org/jira/browse/YARN-7914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Critical
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7914.001.patch
>
>
> Currently, if c-e is unable to obtain the PID for a short lived Docker 
> container, the exitcode will not be properly obtained via {{docker inspect.}} 
> This results in containers successfully completing when they should fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8029) YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS should not use commas as separators

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8029:
--
Labels: Docker  (was: )

> YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS should not use commas as separators
> 
>
> Key: YARN-8029
> URL: https://issues.apache.org/jira/browse/YARN-8029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8029.001.patch, YARN-8029.002.patch
>
>
> The following docker-related environment variables specify a comma-separated 
> list of mounts:
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS
> YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS
> This is a problem because hadoop -Dmapreduce.map.env and related options use  
> comma as a delimiter.   So if I put more than one mount in 
> YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS the comma in the variable will be 
> treated as a delimiter for the hadoop command line option and all but the 
> first mount will be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6726) Fix issues with docker commands executed by container-executor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6726:
--
Labels: Docker  (was: )

> Fix issues with docker commands executed by container-executor
> --
>
> Key: YARN-6726
> URL: https://issues.apache.org/jira/browse/YARN-6726
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6726.001.patch, YARN-6726.002.patch, 
> YARN-6726.003.patch
>
>
> docker inspect, rm, stop, etc are issued through container-executor. Commands 
> other than docker run are not functioning properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7848) Force removal of docker containers that do not get removed on first try

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7848:
--
Labels: Docker  (was: )

> Force removal of docker containers that do not get removed on first try
> ---
>
> Key: YARN-7848
> URL: https://issues.apache.org/jira/browse/YARN-7848
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> After the addition of YARN-5366, containers will get removed after a certain 
> debug delay. However, this is a one-time effort. If the removal fails for 
> whatever reason, the container will persist. We need to add a mechanism for a 
> forced removal of those containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7729) Add support for setting the PID namespace mode

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7729:
--
Labels: Docker  (was: )

> Add support for setting the PID namespace mode
> --
>
> Key: YARN-7729
> URL: https://issues.apache.org/jira/browse/YARN-7729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7729.001.patch, YARN-7729.002.patch, 
> YARN-7729.003.patch
>
>
> Docker has support for allowing containers to share the PID namespace with 
> the host or other containers via the {{docker run --pid}} flag.
> There are a number of use cases where this is desirable:
> * Monitoring tools running in containers that need access to the host level 
> PIDs.
> * Debug containers that can attach to another container to run strace, gdb, 
> etc.
> * Testing Docker on YARN in a container, where the docker socket is bind 
> mounted.
> Enabling this feature should be considered privileged as it exposes host 
> details inside the container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7960) Add no-new-privileges flag to docker run

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7960:
--
Labels: Docker  (was: )

> Add no-new-privileges flag to docker run
> 
>
> Key: YARN-7960
> URL: https://issues.apache.org/jira/browse/YARN-7960
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> Minimally, this should be used for unprivileged containers. It's a cheap way 
> to add an extra layer of security to the docker model. For privileged 
> containers, it might be appropriate to omit this flag
> https://github.com/moby/moby/pull/20727



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3853) Add docker container runtime support to LinuxContainterExecutor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3853:
--
Labels: Docker  (was: )

> Add docker container runtime support to LinuxContainterExecutor
> ---
>
> Key: YARN-3853
> URL: https://issues.apache.org/jira/browse/YARN-3853
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-3853.001.patch, YARN-3853.002.patch
>
>
> Create a new DockerContainerRuntime that implements support for docker 
> containers via container-executor. LinuxContainerExecutor should default to 
> current behavior when launching containers but switch to docker when 
> requested. 
> Overview
> ===
> The current mechanism of launching/signaling containers is moved to its own 
> (default) container runtime. In order to use docker container runtime a 
> couple of environment variables have to be set. This will have to be 
> revisited when we have a first class client side API to specify different 
> container types and associated parameters. Using ‘pi’ as an example and using 
> a custom docker image, this is how you could use the docker container runtime 
> (LinuxContainerExecutor must be in use and the docker daemon needs to be 
> running) :
> {code}
> export 
> YARN_EXAMPLES_JAR=./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar
> bin/yarn jar $YARN_EXAMPLES_JAR pi 
> -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk"
>  
> -Dyarn.app.mapreduce.am.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk"
>   
> -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=ashahab/hadoop-trunk"
>  4 1000
> {code}
>  
> LinuxContainerExecutor can delegate to either runtime on a per container 
> basis. If the docker container type is selected, LinuxContainerExecutor 
> delegates to the DockerContainerRuntime which in turn uses docker support in 
> the container-executor binary to launch/manage docker containers ( see 
> YARN-3852 ) . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7916) Remove call to docker logs on failure in container-executor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7916:
--
Labels: Docker  (was: )

> Remove call to docker logs on failure in container-executor
> ---
>
> Key: YARN-7916
> URL: https://issues.apache.org/jira/browse/YARN-7916
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7916.001.patch
>
>
> If a Docker container fails with a non-zero exit code, container-executor 
> attempts to run a {{docker logs --tail=250 container_name}} to provide more 
> details on why the container failed. While the idea is good, the current 
> implementation will fail for most containers as they are leveraging a launch 
> script whose output will be redirected to a file. The {{--tail}} option 
> throws an error if no log output is available for the container, resulting in 
> the docker logs command returning rc=1 in most cases.
> I propose we remove this code from container-executor. Alternative approaches 
> to handle logging can be explored as part of supporting an image's entrypoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6919) Add default volume mount list

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6919:
--
Labels: Docker  (was: )

> Add default volume mount list
> -
>
> Key: YARN-6919
> URL: https://issues.apache.org/jira/browse/YARN-6919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
>
> Piggybacking on YARN-5534, we should create a default list that bind mounts 
> selected volumes into all docker containers. This list will be empty by 
> default 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5299) Log Docker run command when container fails

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5299:
--
Labels: Docker  (was: )

> Log Docker run command when container fails
> ---
>
> Key: YARN-5299
> URL: https://issues.apache.org/jira/browse/YARN-5299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5299.001.patch
>
>
> It's useful to have the docker run command logged when containers fail to 
> help debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7935) Expose container's hostname to applications running within the docker container

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7935:
--
Labels: Docker  (was: )

> Expose container's hostname to applications running within the docker 
> container
> ---
>
> Key: YARN-7935
> URL: https://issues.apache.org/jira/browse/YARN-7935
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7935.1.patch, YARN-7935.2.patch, YARN-7935.3.patch
>
>
> Some applications have a need to bind to the container's hostname (like 
> Spark) which is different from the NodeManager's hostname(NM_HOST which is 
> available as an env during container launch) when launched through Docker 
> runtime. The container's hostname can be exposed to applications via an env 
> CONTAINER_HOSTNAME. Another potential candidate is the container's IP but 
> this can be addressed in a separate jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7446) Docker container privileged mode and --user flag contradict each other

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7446:
--
Labels: Docker  (was: )

> Docker container privileged mode and --user flag contradict each other
> --
>
> Key: YARN-7446
> URL: https://issues.apache.org/jira/browse/YARN-7446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7446.001.patch, YARN-7446.002.patch, 
> YARN-7446.003.patch, YARN-7446.004.patch
>
>
> In the current implementation, when privileged=true, --user flag is also 
> passed to docker for launching container.  In reality, the container has no 
> way to use root privileges unless there is sticky bit or sudoers in the image 
> for the specified user to gain privileges again.  To avoid duplication of 
> dropping and reacquire root privileges, we can reduce the duplication of 
> specifying both flag.  When privileged mode is enabled, --user flag should be 
> omitted.  When non-privileged mode is enabled, --user flag is supplied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4643) Container recovery is broken with delegating container runtime

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4643:
--
Labels: Docker  (was: )

> Container recovery is broken with delegating container runtime
> --
>
> Key: YARN-4643
> URL: https://issues.apache.org/jira/browse/YARN-4643
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Critical
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4643.001.patch, YARN-4643.002.patch
>
>
> Delegating container runtime uses the container's launch context to determine 
> which runtime to use. However, during container recovery, a container object 
> is not passed as input which leads to a {{NullPointerException}} when 
> attempting to access the container's launch context.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7999:
--
Labels: Docker  (was: )

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7723) Avoid using docker volume --format option to run against older docker releases

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7723:
--
Labels: Docker  (was: )

> Avoid using docker volume --format option to run against older docker releases
> --
>
> Key: YARN-7723
> URL: https://issues.apache.org/jira/browse/YARN-7723
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7723.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4004) container-executor should print output of docker logs if the docker container exits with non-0 exit status

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4004:
--
Labels: Docker  (was: )

> container-executor should print output of docker logs if the docker container 
> exits with non-0 exit status
> --
>
> Key: YARN-4004
> URL: https://issues.apache.org/jira/browse/YARN-4004
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4004.001.patch, YARN-4004.002.patch, 
> YARN-4004.003.patch
>
>
> When a docker container exits with a non-0 exit code, we should print the 
> docker logs to make debugging easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7810) TestDockerContainerRuntime test failures due to UID lookup of a non-existent user

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7810:
--
Labels: Docker  (was: )

> TestDockerContainerRuntime test failures due to UID lookup of a non-existent 
> user
> -
>
> Key: YARN-7810
> URL: https://issues.apache.org/jira/browse/YARN-7810
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2
>
> Attachments: YARN-7810-branch-2.001.patch, 
> YARN-7810-branch-2.002.patch, YARN-7810-branch-3.0.001.patch, 
> YARN-7810.001.patch, YARN-7810.002.patch
>
>
> YARN-7782 enabled the Docker runtime feature to remap the username to uid:gid 
> form for launching Docker containers. The feature does an {{id -u}} and {{id 
> -G}} to get the UID and GIDs. This fails with the test user, as that user 
> doesn't actually exist on the host.
> {code:java}
> [ERROR] 
> testContainerLaunchWithCustomNetworks(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime)
>   Time elapsed: 0.411 s  <<< ERROR!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  
> ExitCodeException exitCode=1: id: 'run_as_user': no such user
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.getUserIdInfo(DockerLinuxContainerRuntime.java:711)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:757)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks(TestDockerContainerRuntime.java:599){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6494) add mounting of HDFS Short-Circuit path for docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6494:
--
Labels: Docker  (was: )

> add mounting of HDFS Short-Circuit path for docker containers
> -
>
> Key: YARN-6494
> URL: https://issues.apache.org/jira/browse/YARN-6494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jaeboo Jeong
>Assignee: Jaeboo Jeong
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6494.001.patch, YARN-6494.002.patch
>
>
> Currently there is a error message about HDFS short-circuit when docker 
> container start.
> {code}
> WARN [main] org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory: error 
> creating DomainSocket
> java.net.ConnectException: connect(2) error: No such file or directory when 
> trying to connect to ‘xxx’
> at org.apache.hadoop.net.unix.DomainSocket.connect0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.connect(DomainSocket.java:250)
> at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.createSocket(DomainSocketFactory.java:164)
> at 
> org.apache.hadoop.hdfs.BlockReaderFactory.nextDomainPeer(BlockReaderFactory.java:752)
> ...
> {code}
> if dfs.client.read.shortcircuit is true and dfs.domain.socket.path isn't 
> equal “”, we need to mount volume for short-circuit path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4553) Add cgroups support for docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4553:
--
Labels: Docker  (was: )

> Add cgroups support for docker containers
> -
>
> Key: YARN-4553
> URL: https://issues.apache.org/jira/browse/YARN-4553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4553.001.patch, YARN-4553.002.patch, 
> YARN-4553.003.patch
>
>
> Currently, cgroups-based resource isolation does not work with docker 
> containers under YARN. The processes in these containers are launched by the 
> docker daemon and they are not children of a container-executor process. 
> Docker supports a --cgroup-parent flag which can be used to point to the 
> container-specific cgroups that are created by the nodemanager. This will 
> allow the Nodemanager to manage cgroups (as it does today) while allowing 
> resource isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5596) Fix failing unit test in TestDockerContainerRuntime

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5596:
--
Labels: Docker  (was: )

> Fix failing unit test in TestDockerContainerRuntime
> ---
>
> Key: YARN-5596
> URL: https://issues.apache.org/jira/browse/YARN-5596
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Minor
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5596.001.patch, YARN-5596.002.patch
>
>
> /sys/fs/cgroup doesn't exist on Mac OS X. And the tests seem to fail because 
> of this. 
> {code}
> Failed tests:
>   TestDockerContainerRuntime.testContainerLaunchWithCustomNetworks:456 
> expected:<...ET_BIND_SERVICE -v /[sys/fs/cgroup:/sys/fs/cgroup:ro -v 
> /]test_container_local...> but was:<...ET_BIND_SERVICE -v 
> /[]test_container_local...>
>   TestDockerContainerRuntime.testContainerLaunchWithNetworkingDefaults:401 
> expected:<...ET_BIND_SERVICE -v /[sys/fs/cgroup:/sys/fs/cgroup:ro -v 
> /]test_container_local...> but was:<...ET_BIND_SERVICE -v 
> /[]test_container_local...>
>   TestDockerContainerRuntime.testDockerContainerLaunch:297 
> expected:<...ET_BIND_SERVICE -v /[sys/fs/cgroup:/sys/fs/cgroup:ro -v 
> /]test_container_local...> but was:<...ET_BIND_SERVICE -v 
> /[]test_container_local...>
> Tests run: 19, Failures: 3, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6623:
--
Labels: Docker  (was: )

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-6623-branch-2.013.patch, 
> YARN-6623-branch-2.014.patch, YARN-6623-branch-2.015.patch, 
> YARN-6623.001.patch, YARN-6623.002.patch, YARN-6623.003.patch, 
> YARN-6623.004.patch, YARN-6623.005.patch, YARN-6623.006.patch, 
> YARN-6623.007.patch, YARN-6623.008.patch, YARN-6623.009.patch, 
> YARN-6623.010.patch, YARN-6623.011.patch, YARN-6623.012.patch, 
> YARN-6623.013.patch, cetest.stderr, cetest.stdout
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7025) Make CGROUPS_ROOT_DIRECTORY configurable in DockerLinuxContainerRuntime

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7025:
--
Labels: Docker  (was: )

> Make CGROUPS_ROOT_DIRECTORY configurable in DockerLinuxContainerRuntime
> ---
>
> Key: YARN-7025
> URL: https://issues.apache.org/jira/browse/YARN-7025
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6374) Improve test coverage and add utility classes for common Docker operations

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6374:
--
Labels: Docker  (was: )

> Improve test coverage and add utility classes for common Docker operations
> --
>
> Key: YARN-6374
> URL: https://issues.apache.org/jira/browse/YARN-6374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-6374-branch-2.001.patch, 
> YARN-6374-branch-2.002.patch, YARN-6374.001.patch, YARN-6374.002.patch, 
> YARN-6374.003.patch
>
>
> Currently, it is tedious to execute Docker related operations due to the 
> plumbing needed to define the DockerCommand, writing the command file, 
> configuring privileged operation, and finally executing the command and 
> validating the result. Obtaining the current status of a Docker container can 
> also be improved. Finally, the test coverage is lacking for Docker Commands. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7782) Enable user re-mapping for Docker containers in yarn-default.xml

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7782:
--
Labels: Docker  (was: )

> Enable user re-mapping for Docker containers in yarn-default.xml
> 
>
> Key: YARN-7782
> URL: https://issues.apache.org/jira/browse/YARN-7782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7782.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user. In YARN-7430, the user remapping 
> is default to true, but yarn-default.xml is still set to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4595) Add support for configurable read-only mounts when launching Docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4595:
--
Labels: Docker  (was: )

> Add support for configurable read-only mounts when launching Docker containers
> --
>
> Key: YARN-4595
> URL: https://issues.apache.org/jira/browse/YARN-4595
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4595.1.patch, YARN-4595.2.patch, YARN-4595.3.patch, 
> YARN-4595.4.patch, YARN-4595.5.patch
>
>
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container.  We could allow 
> the user to set a list of mounts in the environment of ContainerLaunchContext 
> (e.g. /dir1:/targetdir1,/dir2:/targetdir2).  These would be mounted read-only 
> to the specified target locations.
> Due to permissions and user concerns, for this ticket we will require the 
> mounts to be resources that are in the distributed cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4267) Add additional logging to container launch implementations in container-executor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4267:
--
Labels: Docker  (was: )

> Add additional logging to container launch implementations in 
> container-executor
> 
>
> Key: YARN-4267
> URL: https://issues.apache.org/jira/browse/YARN-4267
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4267.001.patch
>
>
> The launch container implementations in container-executor involve several 
> steps and when a container launch fails, it is not always evident what steps 
> were successfully completed. This is particularly true of launching docker 
> containers. Additional logging would help in diagnosing problems/debug 
> issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5793) Trim configuration values in DockerLinuxContainerRuntime

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5793:
--
Labels: Docker  (was: )

> Trim configuration values in DockerLinuxContainerRuntime
> 
>
> Key: YARN-5793
> URL: https://issues.apache.org/jira/browse/YARN-5793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Tianyin Xu
>Assignee: Tianyin Xu
>Priority: Minor
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5793..patch, YARN-5793.0001.patch
>
>
> The current implementation of {{DockerLinuxContainerRuntime}} does not follow 
> the practice of trimming configuration values. This leads to errors if users 
> set values containing space or newline.
> see the following YARN commits as reference:
> YARN-3395. FairScheduler: Trim whitespaces when using username for queuename.
> YARN-2869. CapacityScheduler should trim sub queue names when parse 
> configuration.
> YARN-2843. Fixed NodeLabelsManager to trim inputs for hosts and labels so as 
> to make them work correctly.
> and many other Hadoop/HDFS commits (just list a few):
> HDFS-9708. FSNamesystem.initAuditLoggers() doesn't trim classnames
> HDFS-2799. Trim fs.checkpoint.dir values.
> HADOOP-6578. Configuration should trim whitespace around a lot of value types
> HADOOP-6534. Trim whitespace from directory lists initializing
> Patch is available against trunk
> {code:title=DockerLinuxContainerRuntime.java|borderStyle=solid}
> @@ -219,9 +219,9 @@ public void initialize(Configuration conf)
>  dockerClient = new DockerClient(conf);
>  allowedNetworks.clear();
>  allowedNetworks.addAll(Arrays.asList(
> -
> conf.getStrings(YarnConfiguration.NM_DOCKER_ALLOWED_CONTAINER_NETWORKS,
> +
> conf.getTrimmedStrings(YarnConfiguration.NM_DOCKER_ALLOWED_CONTAINER_NETWORKS,
>  
> YarnConfiguration.DEFAULT_NM_DOCKER_ALLOWED_CONTAINER_NETWORKS)));
> -defaultNetwork = conf.get(
> +defaultNetwork = conf.getTrimmed(
>  YarnConfiguration.NM_DOCKER_DEFAULT_CONTAINER_NETWORK,
>  YarnConfiguration.DEFAULT_NM_DOCKER_DEFAULT_CONTAINER_NETWORK);
>  
> @@ -237,7 +237,7 @@ public void initialize(Configuration conf)
>throw new ContainerExecutionException(message);
>  }
>  
> -privilegedContainersAcl = new AccessControlList(conf.get(
> +privilegedContainersAcl = new AccessControlList(conf.getTrimmed(
>  YarnConfiguration.NM_DOCKER_PRIVILEGED_CONTAINERS_ACL,
>  YarnConfiguration.DEFAULT_NM_DOCKER_PRIVILEGED_CONTAINERS_ACL));
>}
> @@ -439,7 +439,7 @@ public void launchContainer(ContainerRuntimeContext ctx)
>  LOCALIZED_RESOURCES);
>  @SuppressWarnings("unchecked")
>  List userLocalDirs = ctx.getExecutionAttribute(USER_LOCAL_DIRS);
> -Set capabilities = new HashSet<>(Arrays.asList(conf.getStrings(
> +Set capabilities = new 
> HashSet<>(Arrays.asList(conf.getTrimmedStrings(
>  YarnConfiguration.NM_DOCKER_CONTAINER_CAPABILITIES,
>  YarnConfiguration.DEFAULT_NM_DOCKER_CONTAINER_CAPABILITIES)));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4759) Fix signal handling for docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4759:
--
Labels: Docker  (was: )

> Fix signal handling for docker containers
> -
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch, 
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7626) Allow regular expression matching in container-executor.cfg for devices and named docker volumes mount

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7626:
--
Labels: Docker  (was: )

> Allow regular expression matching in container-executor.cfg for devices and 
> named docker volumes mount
> --
>
> Key: YARN-7626
> URL: https://issues.apache.org/jira/browse/YARN-7626
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7626.001.patch, YARN-7626.002.patch, 
> YARN-7626.003.patch, YARN-7626.004.patch, YARN-7626.005.patch, 
> YARN-7626.006.patch, YARN-7626.007.patch, YARN-7626.008.patch, 
> YARN-7626.009.patch, YARN-7626.010.patch, YARN-7626.011.patch
>
>
> Currently when we config some of the GPU devices related fields (like ) in 
> container-executor.cfg, these fields are generated based on different driver 
> versions or GPU device names. We want to enable regular expression matching 
> so that user don't need to manually set up these fields when config 
> container-executor.cfg,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3851) Add support for container runtimes in YARN

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3851:
--
Labels: Docker  (was: )

> Add support for container runtimes in YARN 
> ---
>
> Key: YARN-3851
> URL: https://issues.apache.org/jira/browse/YARN-3851
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> We need the ability to support different container types within the same 
> executor. Container runtimes are lower-level implementations for supporting 
> specific container engines (e.g docker). These are meant to be independent of 
> executors themselves - a given executor (e.g LinuxContainerExecutor) could 
> potentially switch between different container runtimes depending on what a 
> client/application is requesting. An executor continues to provide higher 
> level functionality that could be specific to an operating system - for 
> example, LinuxContainerExecutor continues to handle cgroups, users, 
> diagnostic events etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5458:
--
Labels: Docker  (was: )

> Rename DockerStopCommandTest to TestDockerStopCommand
> -
>
> Key: YARN-5458
> URL: https://issues.apache.org/jira/browse/YARN-5458
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5458.001.patch
>
>
> DockerStopCommandTest does not follow the naming convention for test classes, 
> rename it to TestDockerStopCommand



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6456:
--
Labels: Docker  (was: )

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>  Labels: Docker
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7516) Security check for trusted docker image

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7516:
--
Labels: Docker  (was: )

> Security check for trusted docker image
> ---
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch, YARN-7516.005.patch, 
> YARN-7516.006.patch, YARN-7516.007.patch, YARN-7516.008.patch, 
> YARN-7516.009.patch, YARN-7516.010.patch, YARN-7516.011.patch, 
> YARN-7516.012.patch, YARN-7516.013.patch, YARN-7516.014.patch, 
> YARN-7516.015.patch, YARN-7516.016.patch, YARN-7516.017.patch, 
> YARN-7516.018.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8027:
--
Labels: Docker  (was: )

> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0, 3.0.3
>
> Attachments: YARN-8027-branch-3.0.001.patch, 
> YARN-8027-branch-3.0.002.patch, YARN-8027.001.patch
>
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7654) Support ENTRY_POINT for docker container

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7654:
--
Labels: Docker  (was: )

> Support ENTRY_POINT for docker container
> 
>
> Key: YARN-7654
> URL: https://issues.apache.org/jira/browse/YARN-7654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
>  Labels: Docker
> Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4578) Directories that are mounted in docker containers need to be more restrictive/container-specific

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4578:
--
Labels: Docker  (was: )

> Directories that are mounted in docker containers need to be more 
> restrictive/container-specific
> 
>
> Key: YARN-4578
> URL: https://issues.apache.org/jira/browse/YARN-4578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4578.001.patch, YARN-4578.002.patch
>
>
> Currently, the "top level" log and local directories are mounted inside 
> docker containers (see below for an example). This is not restrictive enough 
> - we need to ensure that only container specific directories are mounted. 
> {code}
> /dev/sda4 on /grid/0/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sda4 on 
> /grid/0/hadoop/yarn/local/usercache/root/appcache/application_1451931954322_0020/container_e50_1451931954322_0020_01_02
>  type ext4 (rw,relatime,data=ordered)
> /dev/sda4 on /grid/0/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> /dev/sdb1 on /grid/1/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sdb1 on /grid/1/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> /dev/sdc1 on /grid/2/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sdc1 on /grid/2/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> /dev/sdd1 on /grid/3/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sdd1 on /grid/3/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> /dev/sde1 on /grid/4/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sde1 on /grid/4/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> /dev/sdf1 on /grid/5/hadoop/yarn/local type ext4 (rw,relatime,data=ordered)
> /dev/sdf1 on /grid/5/hadoop/yarn/log type ext4 (rw,relatime,data=ordered)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7973:
--
Labels: Docker  (was: )

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8206) Sending a kill does not immediately kill docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8206:
--
Labels: Docker  (was: )

> Sending a kill does not immediately kill docker containers
> --
>
> Key: YARN-8206
> URL: https://issues.apache.org/jira/browse/YARN-8206
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8206.001.patch, YARN-8206.002.patch, 
> YARN-8206.003.patch, YARN-8206.004.patch
>
>
> {noformat}
> if (ContainerExecutor.Signal.KILL.equals(signal)
> || ContainerExecutor.Signal.TERM.equals(signal)) {
>   handleContainerStop(containerId, env);
> {noformat}
> Currently in the code, we are handling both SIGKILL and SIGTERM as equivalent 
> for docker containers. However, they should actually be separate. When YARN 
> sends a SIGKILL to a process, it means for it to die immediately and not sit 
> around waiting for anything. This ensures an immediate reclamation of 
> resources. Additionally, if a SIGTERM is sent before the SIGKILL, the task 
> might not handle the signal correctly, and will then end up as a failed task 
> instead of a killed task. This is especially bad for preemption. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6576) Improve Diagonstic by moving Error stack trace from NM to slider AM

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6576:
--
Labels: Docker  (was: )

> Improve Diagonstic by moving Error stack trace from NM to slider AM
> ---
>
> Key: YARN-6576
> URL: https://issues.apache.org/jira/browse/YARN-6576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Yesha Vora
>Priority: Major
>  Labels: Docker
>
> Slider Master diagonstics should improve to show root cause of App failures 
> for issues like missing docker image.
> Currently, Slider Master log does not show proper error message to debug such 
> failure. User have to access Nodemanager logs to find out root cause of such 
> issues where container failed to start. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5671) Add support for Docker image clean up

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5671:
--
Labels: Docker  (was: )

> Add support for Docker image clean up
> -
>
> Key: YARN-5671
> URL: https://issues.apache.org/jira/browse/YARN-5671
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Priority: Major
>  Labels: Docker
>
> Regarding to Docker image localization, we also need a way to clean up the 
> old/stale Docker image to save storage space. We may extend deletion service 
> to utilize "docker rm" to do this.
> This is related to YARN-3854 and may depend on its implementation. Please 
> refer to YARN-3854 for Docker image localization details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4262) Allow whitelisted users to run privileged docker containers.

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4262:
--
Labels: Docker  (was: )

> Allow whitelisted users to run privileged docker containers. 
> -
>
> Key: YARN-4262
> URL: https://issues.apache.org/jira/browse/YARN-4262
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4262.001.patch, YARN-4262.002.patch, 
> YARN-4262.003.patch
>
>
> (Updated based on discussion in the JIRA)
> There are scenarios where privileged containers are necessary in order to run 
> certain kinds of applications (one example is trying to run postresql/oracle 
> inside containers). However, given the security implications, we should 
> ensure that : 
> 1) privileged containers are disabled by default
> 2) if enabled, only a whitelisted set of users should be allowed to launch 
> such containers and 
> 3) Not all containers launched by whitelisted users need to be privileged 
> containers : whitelisted users need to explicitly request that a privileged 
> container be launched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4266) Allow users to enter containers as UID:GID pair instead of by username

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4266:
--
Labels: Docker  (was: )

> Allow users to enter containers as UID:GID pair instead of by username
> --
>
> Key: YARN-4266
> URL: https://issues.apache.org/jira/browse/YARN-4266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: luhuichun
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-4266-branch-2.8.001.patch, YARN-4266.001.patch, 
> YARN-4266.001.patch, YARN-4266.002.patch, YARN-4266.003.patch, 
> YARN-4266.004.patch, YARN-4266.005.patch, YARN-4266.006.patch, 
> YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping.pdf, 
> YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v2.pdf, 
> YARN-4266_Allow_whitelisted_users_to_disable_user_re-mapping_v3.pdf
>
>
> Docker provides a mechanism (the --user switch) that enables us to specify 
> the user the container processes should run as. We use this mechanism today 
> when launching docker containers . In non-secure mode, we run the docker 
> container based on 
> `yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user` and in 
> secure mode, as the submitting user. However, this mechanism breaks down with 
> a large number of 'pre-created' images which don't necessarily have the users 
> available within the image. Examples of such images include shared images 
> that need to be used by multiple users. We need a way in which we can allow a 
> pre-defined set of users to run containers based on existing images, without 
> using the --user switch. There are some implications of disabling this user 
> squashing that we'll need to work through : log aggregation, artifact 
> deletion etc.,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5670) Add support for Docker image clean up

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5670:
--
Labels: Docker  (was: )

> Add support for Docker image clean up
> -
>
> Key: YARN-5670
> URL: https://issues.apache.org/jira/browse/YARN-5670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> Regarding to Docker image localization, we also need a way to clean up the 
> old/stale Docker image to save storage space. We may extend deletion service 
> to utilize "docker rm" to do this.
> This is related to YARN-3854 and may depend on its implementation. Please 
> refer to YARN-3854 for Docker image localization details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7644) NM gets backed up deleting docker containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7644:
--
Labels: Docker  (was: )

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7353) Docker permitted volumes don't properly check for directories

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7353:
--
Labels: Docker  (was: )

> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
> if (permitted_mount_len > 0
> && permitted_mounts[i][permitted_mount_len - 1] == '/') {
>   if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) 
> == 0) {
> ret = 1;
> break;
>   }
> }
> {noformat}
> This code will treat "/home/" as a directory, but not "/home"
> {noformat}
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] TestDockerUtil.test_check_mount_permitted
> [  FAILED  ] TestDockerUtil.test_normalize_mounts
> [  FAILED  ] TestDockerUtil.test_add_rw_mounts
> {noformat}
> Additionally, YARN-6623 introduced new test failures in the C++ 
> container-executor test "cetest"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5428) Allow for specifying the docker client configuration directory

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5428:
--
Labels: Docker oct16-medium  (was: oct16-medium)

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker, oct16-medium
> Fix For: 3.1.0
>
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch, YARN-5428.005.patch, 
> YARN-5428.006.patch, YARN-5428.007.patch, YARN-5428.008.patch, 
> YARN-5428.009.patch, 
> YARN-5428Allowforspecifyingthedockerclientconfigurationdirectory.pdf
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5360) Decouple host user and Docker container user

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5360:
--
Labels: Docker  (was: )

> Decouple host user and Docker container user
> 
>
> Key: YARN-5360
> URL: https://issues.apache.org/jira/browse/YARN-5360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
>
> There is *a dependency between job submitting user and the user in the Docker 
> image* in LCE currently. For instance, in order to run the Docker container 
> as yarn user, we can choose set the 
> "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" to yarn 
> and leave 
> "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" 
> default (true). Then LCE will choose yarn ( UID maybe 1001) as the user 
> running jobs.
> LCE will mount the generated launch_container.sh (owned by the running job 
> user) and /etc/passwd (*current the code is mounting to container's 
> /etc/password, I think it's a mistake*) into the Docker container and 
> utilizes "docker run --user=" option to get it done internally.
> Mounting /etc/passwd to the container is a not good choice due to override 
> original users defined in Docker image. As far as I know, since Docker v1.8 
> (or maybe earlier), the Docker run command "--user=" option accepts UID and 
> *when passing UID, the user does not have to exist in the container*. So we 
> could use UID instead of user name to construct the Docker run command to 
> eliminate the dependency that create the same user in the Docker image. This 
> enables LCE the ability to launch any Docker container safely regardless what 
> users in it.
> But this is not enough to decouple host user and Docker container user. The 
> final solution we are searching for are focused on allowing users to run 
> their Docker images flexibly without involving dependencies of YARN and make 
> sure the container won't bring in security risk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6305) Improve signaling of short lived containers

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6305:
--
Labels: Docker  (was: )

> Improve signaling of short lived containers
> ---
>
> Key: YARN-6305
> URL: https://issues.apache.org/jira/browse/YARN-6305
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> Currently it is possible for containers to leak and remain in an exited state 
> if a docker container is not fully started before being killed. Depending on 
> the selected Docker storage driver, the lower bound on starting a container 
> can be as much as three seconds (using {{docker run}}). If an implicit image 
> pull occurs, this could be much longer.
> When a container is not fully started, the PID is not available yet. As a 
> result, {{ContainerLaunch#cleanUpContainer}} will not signal the container as 
> it relies on the PID. The PID is not required for docker client operations, 
> so allowing the signaling to occur anyway appears to be appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7917) Fix failing test TestDockerContainerRuntime#testLaunchContainerWithDockerTokens

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7917:
--
Labels: Docker  (was: )

> Fix failing test 
> TestDockerContainerRuntime#testLaunchContainerWithDockerTokens
> ---
>
> Key: YARN-7917
> URL: https://issues.apache.org/jira/browse/YARN-7917
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Minor
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7917.001.patch
>
>
> {{TestDockerContainerRuntime#testLaunchContainerWithDockerTokens}} is 
> failing. YARN-7815 ended up going in before YARN-5428, and this test included 
> in YARN-5428 didn't include the updated mounts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5669) Add support for Docker pull

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5669:
--
Labels: Docker  (was: )

> Add support for Docker pull
> ---
>
> Key: YARN-5669
> URL: https://issues.apache.org/jira/browse/YARN-5669
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: luhuichun
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-5669.001.patch
>
>
> We need to add docker pull to support Docker image localization. Refer to 
> YARN-3854 for the details. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5534) Allow user provided Docker volume mount list

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5534:
--
Labels: Docker  (was: )

> Allow user provided Docker volume mount list
> 
>
> Key: YARN-5534
> URL: https://issues.apache.org/jira/browse/YARN-5534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: luhuichun
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-5534.001.patch, YARN-5534.002.patch, 
> YARN-5534.003.patch, YARN-5534.004.patch, YARN-5534.005.patch, 
> YARN-5534.006.patch, YARN-5534.007.patch
>
>
> YARN-6623 added support in container-executor for admin supplied Docker 
> volume whitelists. This allows controlling which host directories can be 
> mounted into Docker containers launched by YARN. A read-only and read-write 
> whitelist was added. We now need the ability for users to supply the mounts 
> they require for their application, which will be validated against the admin 
> whitelist in container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5298) Mount usercache and NM filecache directories into Docker container

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5298:
--
Labels: Docker  (was: )

> Mount usercache and NM filecache directories into Docker container
> --
>
> Key: YARN-5298
> URL: https://issues.apache.org/jira/browse/YARN-5298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5298.001.patch, YARN-5298.002.patch
>
>
> Currently, we don't mount the usercache and the NM filecache directories into 
> the Docker container. This can lead to issues with containers that rely on 
> public and application scope resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7286) Add support for docker to have no capabilities

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7286:
--
Labels: Docker  (was: )

> Add support for docker to have no capabilities
> --
>
> Key: YARN-7286
> URL: https://issues.apache.org/jira/browse/YARN-7286
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7286.001.patch, YARN-7286.002.patch, 
> YARN-7286.003.patch, YARN-7286.004.patch, YARN-7286.005.patch, 
> YARN-7286.006.patch, YARN-7286.007.patch, YARN-7286.008.patch
>
>
> Support for controlling capabilities was introduced in YARN-4258. However, it 
> does not allow for the capabilities list to be NULL, since {{getStrings()}} 
> will treat an empty value the same as it treats an unset property. So, a NULL 
> list will actually give the default capabilities list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8064) Docker ".cmd" files should not be put in hadoop.tmp.dir

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8064:
--
Labels: Docker  (was: )

> Docker ".cmd" files should not be put in hadoop.tmp.dir
> ---
>
> Key: YARN-8064
> URL: https://issues.apache.org/jira/browse/YARN-8064
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8064-branch-3.1.009.patch, YARN-8064.001.patch, 
> YARN-8064.002.patch, YARN-8064.003.patch, YARN-8064.004.patch, 
> YARN-8064.005.patch, YARN-8064.006.patch, YARN-8064.007.patch, 
> YARN-8064.008.patch, YARN-8064.009.patch
>
>
> Currently all of the docker command files are being put into 
> {{hadoop.tmp.dir}}, which doesn't get cleaned up. So, eventually all of the 
> inodes will fill up and no more tasks will be able to run



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7904:
--
Labels: Docker  (was: )

> Privileged, trusted containers need all of their bind-mounted directories to 
> be read-only
> -
>
> Key: YARN-7904
> URL: https://issues.apache.org/jira/browse/YARN-7904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> Since they will be running as some other user than themselves, the NM likely 
> won't be able to clean up after them because of permissions issues. So, to 
> prevent this, we should make these directories read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5628) Remove package line length checkstyle rule

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5628:
--
Labels: Docker  (was: )

> Remove package line length checkstyle rule
> --
>
> Key: YARN-5628
> URL: https://issues.apache.org/jira/browse/YARN-5628
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
>  Labels: Docker
>
> The packages related to the DockerLinuxContainerRuntime all exceed the 80 
> char line length limit enforced by checkstyle. This causes every build to 
> fail with a -1. I would like to exclude this rule from causing a failure.
> {code}
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerContainerStatusHandler.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/package-info.java:23:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/MockPrivilegedOperationCaptor.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged;: 
> Line is longer than 80 characters (found 84).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerRuntimeTestingUtils.java:17:package
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime;: 
> Line is longer than 80 characters (found 81).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/MockDockerContainerStatusHandler.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerCommandExecutor.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerContainerStatusHandler.java:17:package
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker;:
>  Line is longer than 80 characters (found 88).
> {code}
> Alternatively, we could look to restructure the packages here, but I question 
> what value this check really provides.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7797) Docker host network can not obtain IP address for RegistryDNS

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7797:
--
Labels: Docker  (was: )

> Docker host network can not obtain IP address for RegistryDNS
> -
>
> Key: YARN-7797
> URL: https://issues.apache.org/jira/browse/YARN-7797
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.1.0
>
> Attachments: YARN-7797.001.patch, YARN-7797.002.patch, 
> YARN-7797.003.patch, YARN-7797.004.patch, YARN-7797.005.patch
>
>
> When docker is configured to use host network, docker inspect command does 
> not return IP address of the container.  This prevents IP information to be 
> collected for RegistryDNS to register a hostname entry for the docker 
> container.
> The proposed solution is to intelligently detect the docker network 
> deployment method, and report back host IP address for RegistryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7667) Docker Stop grace period should be configurable

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7667:
--
Labels: Docker  (was: )

> Docker Stop grace period should be configurable
> ---
>
> Key: YARN-7667
> URL: https://issues.apache.org/jira/browse/YARN-7667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-7667.001.patch, YARN-7667.002.patch, 
> YARN-7667.003.patch, YARN-7667.004.patch, YARN-7667.005.patch, 
> YARN-7667.006.patch
>
>
> {{DockerStopCommand}} has a {{setGracePeriod}} method, but it is never 
> called. So, the stop uses the 10 second default grace period from docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8228) Docker does not support hostnames greater than 64 characters

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8228:
--
Labels: Docker  (was: )

> Docker does not support hostnames greater than 64 characters
> 
>
> Key: YARN-8228
> URL: https://issues.apache.org/jira/browse/YARN-8228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Shane Kumpf
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8228.001.patch
>
>
> If containers name is greater than 64 characters, docker containers stays in 
> Created state only And app fails with below error
>  
> {code:java}
> /usr/bin/docker-current: Error response from daemon: oci runtime error: 
> container_linux.go:247: starting container process caused 
> "process_linux.go:364: container init caused \"invalid argument\"".
> Could not invoke docker /usr/bin/docker run 
> --name='container_1524681858728_0001_01_04' --user='99:99' -d 
> --workdir='/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/container_1524681858728_0001_01_04'
>  --net='hadoop' -v 
> '/grid/0/hadoop/yarn/local/filecache:/grid/0/hadoop/yarn/local/filecache:ro' 
> -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/filecache:/grid/0/hadoop/yarn/local/usercache/hrt_qa/filecache:ro'
>  -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/filecache/195/httpd-proxy.conf:/etc/httpd/conf.d/httpd-proxy.conf:ro'
>  -v 
> '/grid/0/hadoop/yarn/log/application_1524681858728_0001/container_1524681858728_0001_01_04:/grid/0/hadoop/yarn/log/application_1524681858728_0001/container_1524681858728_0001_01_04'
>  -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001:/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001'
>  
> --cgroup-parent='/hadoop-yarn-tmp-xxx/container_1524681858728_0001_01_04' 
> --cap-drop='ALL' --cap-add='SYS_CHROOT' --cap-add='MKNOD' --cap-add='SETFCAP' 
> --cap-add='SETPCAP' --cap-add='DAC_READ_SEARCH' --cap-add='FSETID' 
> --cap-add='SYS_PTRACE' --cap-add='CHOWN' --cap-add='SYS_ADMIN' 
> --cap-add='AUDIT_WRITE' --cap-add='SETGID' --cap-add='NET_RAW' 
> --cap-add='FOWNER' --cap-add='SETUID' --cap-add='DAC_OVERRIDE' 
> --cap-add='KILL' --cap-add='NET_BIND_SERVICE' 
> --hostname='httpd-proxy-0.fault-test-component-kill-httpd-docker.hrt-qa.test.com'
>  --group-add '99' 'centos/httpd-24-centos7:latest' 'bash' 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/container_1524681858728_0001_01_04/launch_container.sh'
>  .
> Shell output: main : command provided 4
> main : run as user is nobody
> main : requested yarn user is hrt_qa
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Launching docker container...
> Docker run command: /usr/bin/docker run 
> --name='container_1524681858728_0001_01_04' --user='99:99' -d 
> --workdir='/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/container_1524681858728_0001_01_04'
>  --net='hadoop' -v 
> '/grid/0/hadoop/yarn/local/filecache:/grid/0/hadoop/yarn/local/filecache:ro' 
> -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/filecache:/grid/0/hadoop/yarn/local/usercache/hrt_qa/filecache:ro'
>  -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/filecache/195/httpd-proxy.conf:/etc/httpd/conf.d/httpd-proxy.conf:ro'
>  -v 
> '/grid/0/hadoop/yarn/log/application_1524681858728_0001/container_1524681858728_0001_01_04:/grid/0/hadoop/yarn/log/application_1524681858728_0001/container_1524681858728_0001_01_04'
>  -v 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001:/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001'
>  
> --cgroup-parent='/hadoop-yarn-tmp-xxx/container_1524681858728_0001_01_04' 
> --cap-drop='ALL' --cap-add='SYS_CHROOT' --cap-add='MKNOD' --cap-add='SETFCAP' 
> --cap-add='SETPCAP' --cap-add='DAC_READ_SEARCH' --cap-add='FSETID' 
> --cap-add='SYS_PTRACE' --cap-add='CHOWN' --cap-add='SYS_ADMIN' 
> --cap-add='AUDIT_WRITE' --cap-add='SETGID' --cap-add='NET_RAW' 
> --cap-add='FOWNER' --cap-add='SETUID' --cap-add='DAC_OVERRIDE' 
> --cap-add='KILL' --cap-add='NET_BIND_SERVICE' 
> --hostname='httpd-proxy-0.fault-test-component-kill-httpd-docker.hrt-qa.test.com'
>  --group-add '99' 'centos/httpd-24-centos7:latest' 'bash' 
> '/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1524681858728_0001/container_1524681858728_0001_01_04/launch_container.sh'
> 

[jira] [Updated] (YARN-3852) Add docker container support to container-executor

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3852:
--
Labels: Docker  (was: )

> Add docker container support to container-executor 
> ---
>
> Key: YARN-3852
> URL: https://issues.apache.org/jira/browse/YARN-3852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Abin Shahab
>Priority: Major
>  Labels: Docker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852-3.patch, 
> YARN-3852.patch
>
>
> For security reasons, we need to ensure that access to the docker daemon and 
> the ability to run docker containers is restricted to privileged users ( i.e 
> users running applications should not have direct access to docker). In order 
> to ensure the node manager can run docker commands, we need to add docker 
> support to the container-executor binary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7231) Make Docker target directory for cgroups configurable by yarn-site.xml

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7231:
--
Labels: Docker  (was: )

> Make Docker target directory for cgroups configurable by yarn-site.xml
> --
>
> Key: YARN-7231
> URL: https://issues.apache.org/jira/browse/YARN-7231
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> Admins may want/need to specify the cgroups target directory inside of the 
> docker container for convenience and/or legacy reasons. This will allow the 
> node to have its cgroups directory mounted into the container at a different 
> location; one that is controlled by the admin. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8097) Add support for Docker env-file switch

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8097:
--
Labels: Docker  (was: )

> Add support for Docker env-file switch
> --
>
> Key: YARN-8097
> URL: https://issues.apache.org/jira/browse/YARN-8097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8097.001.patch
>
>
> There are two different ways to pass user environment variables to docker.  
> There is -e flag and --env-file which reference to a file that contains 
> environment variables key/value pair.  It would be nice to have a way to 
> express env-file from HDFS, and localize the .env file in container localized 
> directory and pass --env-file flag to docker run command.  This approach 
> would prevent ENV based password to show up in log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8207) Docker container launch use popen have risk of shell expansion

2018-05-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8207:
--
Labels: Docker  (was: )

> Docker container launch use popen have risk of shell expansion
> --
>
> Key: YARN-8207
> URL: https://issues.apache.org/jira/browse/YARN-8207
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8207.001.patch, YARN-8207.002.patch, 
> YARN-8207.003.patch, YARN-8207.004.patch, YARN-8207.005.patch
>
>
> Container-executor code utilize a string buffer to construct docker run 
> command, and pass the string buffer to popen for execution.  Popen spawn a 
> shell to run the command.  Some arguments for docker run are still vulnerable 
> to shell expansion.  The possible solution is to convert from char * buffer 
> to string array for execv to avoid shell expansion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >