[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981784#comment-13981784 ] Hadoop QA commented on YARN-1885: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642040/YARN-1885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3634//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3634//console This message is automatically generated. > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > Attachments: YARN-1885.patch, YARN-1885.patch > > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1696) Document RM HA
[ https://issues.apache.org/jira/browse/YARN-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-1696: --- Attachment: YARN-1696-3.patch I hit upon this JIRA while I was trying to setup RM HA and updated the patch based on my try. I am attaching the updated patch including fixes such as - added link to site.xml, - changed file name, - removed parts duplicated with "ResourceManager Restart" page, - changed the order of some subsections along with the removal of duplication, - added client configurations to the table, - added sample configurations - added description about CLI. Sorry for breaking in, [~kkambatl]. > Document RM HA > -- > > Key: YARN-1696 > URL: https://issues.apache.org/jira/browse/YARN-1696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Priority: Blocker > Attachments: YARN-1696-3.patch, YARN-1696.2.patch, yarn-1696-1.patch > > > Add documentation for RM HA. Marking this a blocker for 2.4 as this is > required to call RM HA Stable and ready for public consumption. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1885: - Attachment: YARN-1885.patch Uploaded new patch solved NPE in UT > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > Attachments: YARN-1885.patch, YARN-1885.patch > > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1987) Wrapper for leveldb DBIterator to aid in handling database exceptions
Jason Lowe created YARN-1987: Summary: Wrapper for leveldb DBIterator to aid in handling database exceptions Key: YARN-1987 URL: https://issues.apache.org/jira/browse/YARN-1987 Project: Hadoop YARN Issue Type: Improvement Reporter: Jason Lowe Assignee: Jason Lowe Per discussions in YARN-1984 and MAPREDUCE-5652, it would be nice to have a utility wrapper around leveldb's DBIterator to translate the raw RuntimeExceptions it can throw into DBExceptions to make it easier to handle database errors while iterating. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981723#comment-13981723 ] Hadoop QA commented on YARN-1885: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642016/YARN-1885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3633//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3633//console This message is automatically generated. > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > Attachments: YARN-1885.patch > > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-483) Improve documentation on log aggregation in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981650#comment-13981650 ] Hudson commented on YARN-483: - SUCCESS: Integrated in Hadoop-trunk-Commit #5575 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5575/]) YARN-483. Improve documentation on log aggregation in yarn-default.xml (Akira Ajisaka via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1590150) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Improve documentation on log aggregation in yarn-default.xml > > > Key: YARN-483 > URL: https://issues.apache.org/jira/browse/YARN-483 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Akira AJISAKA > Fix For: 2.5.0 > > Attachments: YARN-483.2.patch, YARN-483.patch > > > The current documentation for log aggregation is > {code} > > Whether to enable log aggregation > yarn.log-aggregation-enable > false > > {code} > This could be improved to explain what enabling log aggregation does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981632#comment-13981632 ] Abin Shahab commented on YARN-1983: --- Right, there maybe other, better alternatives. What kinds of extensibility are we supporting here? What kind of containers will YARN be able to support? What kind of configurations would these need? >From the answers to these questions we can start talking about the >abstractions needed to let Yarn launch all kinds of containers. > Support heterogeneous container types at runtime on YARN > > > Key: YARN-1983 > URL: https://issues.apache.org/jira/browse/YARN-1983 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du > > Different container types (default, LXC, docker, VM box, etc.) have different > semantics on isolation of security, namespace/env, performance, etc. > Per discussions in YARN-1964, we have some good thoughts on supporting > different types of containers running on YARN and specified by application at > runtime which largely enhance YARN's flexibility to meet heterogenous app's > requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1885: - Attachment: YARN-1885.patch Attached a patch implemented the method in my last comment, I would appreciate some feedbacks for this. Thanks! > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > Attachments: YARN-1885.patch > > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981630#comment-13981630 ] Wangda Tan commented on YARN-1885: -- *This is caused by application completed in RM, but NM cannot recv application clean-up msg after RM restarted. This will cause a serials of problems, include but not limited,* * Log aggregation not works sometimes, * Application shown to “RUNNING” in NM’s web page, but it’s already terminated in RM *We can reproduce this bug by following way, (in a recovery-enabled cluster)* 1) Submit application (has some deliberate errors will cause AM failure) to RM 2) Before application’s state transferred to FAILED, restart RM 3) After RM restarted / NM register, app state will become failed in RM, but it still shown running in NM side *There’re multiple places will cause this problem* 1) Race condition in ResourceTrackerService.registerNodeManager Handle container status logic, {code} if (!request.getContainerStatuses().isEmpty()) { LOG.info("received container statuses on node manager register :" + request.getContainerStatuses()); for (ContainerStatus containerStatus : request.getContainerStatuses()) { handleContainerStatus(containerStatus); } } {code} Happened before create RMNodeImplInstance {code} RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort, resolve(host), ResourceOption.newInstance(capability, RMNode.OVER_COMMIT_TIMEOUT_MILLIS_DEFAULT), nodeManagerVersion); RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode); if (oldNode == null) { this.rmContext.getDispatcher().getEventHandler().handle( new RMNodeEvent(nodeId, RMNodeEventType.STARTED)); } else { LOG.info("Reconnect from the node at: " + host); this.nmLivelinessMonitor.unregister(nodeId); this.rmContext.getDispatcher().getEventHandler().handle( new RMNodeReconnectEvent(nodeId, rmNode)); } {code} So the RMAppImpl.FinalTransition will finish the application, but cannot notify corresponding RMNode. 2) RMAppAttempt cannot get full ranNodes after restart (RMAppAttempt will set to LAUNCHED state after restart) *Proposal* 1) Add full running applications list while NM registering with RM 2) ResourceTrackerService (RTS for short) will, * If RMApp not in final state, add RMNode to RMAppAttempt’s ranNodes. * If RMApp already in final state, send RMNodeCleanAppEvent to RMNode 3) Address race condition described above > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1885) yarn logs command does not provide the application logs for some applications
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-1885: Assignee: Wangda Tan > yarn logs command does not provide the application logs for some applications > - > > Key: YARN-1885 > URL: https://issues.apache.org/jira/browse/YARN-1885 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Wangda Tan > > During our HA testing we have seen cases where yarn application logs are not > available through the cli but i can look at AM logs through the UI. RM was > also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981612#comment-13981612 ] Junping Du commented on YARN-1983: -- Thanks [~ashahab] for volunteering on this. I think we could discuss more options before quickly jumping on the effort of extending ContainerRequest in case other people may have some better ideas. Thoughts? > Support heterogeneous container types at runtime on YARN > > > Key: YARN-1983 > URL: https://issues.apache.org/jira/browse/YARN-1983 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du > > Different container types (default, LXC, docker, VM box, etc.) have different > semantics on isolation of security, namespace/env, performance, etc. > Per discussions in YARN-1964, we have some good thoughts on supporting > different types of containers running on YARN and specified by application at > runtime which largely enhance YARN's flexibility to meet heterogenous app's > requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-483) Improve documentation on log aggregation in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981596#comment-13981596 ] Sandy Ryza commented on YARN-483: - +1 > Improve documentation on log aggregation in yarn-default.xml > > > Key: YARN-483 > URL: https://issues.apache.org/jira/browse/YARN-483 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Akira AJISAKA > Attachments: YARN-483.2.patch, YARN-483.patch > > > The current documentation for log aggregation is > {code} > > Whether to enable log aggregation > yarn.log-aggregation-enable > false > > {code} > This could be improved to explain what enabling log aggregation does. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981589#comment-13981589 ] Sandy Ryza commented on YARN-1864: -- I think it's better to leave out case 4. The right behavior on it is fuzzy, and things are simpler if the results returned by QueuePlacementPolicy are only a function of the configuration. No other comments than that at the moment. > Fair Scheduler Dynamic Hierarchical User Queues > --- > > Key: YARN-1864 > URL: https://issues.apache.org/jira/browse/YARN-1864 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Ashwin Shankar > Labels: scheduler > Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt, YARN-1864-v3.txt > > > In Fair Scheduler, we want to be able to create user queues under any parent > queue in the hierarchy. For eg. Say user1 submits a job to a parent queue > called root.allUserQueues, we want be able to create a new queue called > root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted > by this user to root.allUserQueues will be run in this newly created > root.allUserQueues.user1. > This is very similar to the 'user-as-default' feature in Fair Scheduler which > creates user queues under root queue. But we want the ability to create user > queues under ANY parent queue. > Why do we want this ? > 1. Preemption : these dynamically created user queues can preempt each other > if its fair share is not met. So there is fairness among users. > User queues can also preempt other non-user leaf queue as well if below fair > share. > 2. Allocation to user queues : we want all the user queries(adhoc) to consume > only a fraction of resources in the shared cluster. By creating this > feature,we could do that by giving a fair share to the parent user queue > which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1983) Support heterogeneous container types at runtime on YARN
[ https://issues.apache.org/jira/browse/YARN-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981509#comment-13981509 ] Abin Shahab commented on YARN-1983: --- After YARN-1964, I can work on extending the containerRequest so that it can accommodate these changes at run time. Abin > Support heterogeneous container types at runtime on YARN > > > Key: YARN-1983 > URL: https://issues.apache.org/jira/browse/YARN-1983 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du > > Different container types (default, LXC, docker, VM box, etc.) have different > semantics on isolation of security, namespace/env, performance, etc. > Per discussions in YARN-1964, we have some good thoughts on supporting > different types of containers running on YARN and specified by application at > runtime which largely enhance YARN's flexibility to meet heterogenous app's > requirement on isolation at runtime. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1986) After upgrade from 2.2.0 to 2.4.0, NPE on first job start.
Jon Bringhurst created YARN-1986: Summary: After upgrade from 2.2.0 to 2.4.0, NPE on first job start. Key: YARN-1986 URL: https://issues.apache.org/jira/browse/YARN-1986 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jon Bringhurst After upgrade from 2.2.0 to 2.4.0, NPE on first job start. After RM was restarted, the job runs without a problem. {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981497#comment-13981497 ] Jason Lowe commented on YARN-1985: -- The exit status should be whatever exit status came from the process when it exited. When a container is killed the NM first sends a SIGTERM and then a short time later (250 msec IIRC) it sends SIGKILL. A process that exits with a status code of 0 despite receiving SIGTERM could explain the behavior. It could also happen if the container exited on its own after the NM logged that it was going to kill it but before it actually tried to kill it. Looking at the DefaultContainerExecutor code it certainly appears that the process being killed must have returned an exit code of zero unless you are seeing logs such as "Exit code from container container_1398429077682_0006_02_05 is : " in the logs. I'm not sure exactly what's being run in the container, but checking if that will return an exit code of 0 despite being killed by SIGTERM seems like the next best place to look. > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky >Priority: Minor > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981489#comment-13981489 ] Hadoop QA commented on YARN-1964: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641983/yarn-1964-docker.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3632//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3632//console This message is automatically generated. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Zhurakousky updated YARN-1985: --- Priority: Minor (was: Major) > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky >Priority: Minor > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981474#comment-13981474 ] Oleg Zhurakousky commented on YARN-1985: Actually a bit of a good news. The other two containers didn't start because one of my nodes had its date/time messed up resulting {code} org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. This token is expired. current time is 1398449721411 found 1398448925681 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) . . . {code} So handling 'onStartContainerError' event would do. So this makes it much less of an issue and I can work around it (actually already did), but the fact that _ExitStatus_ for the containers that did start was 0 is still a problem. Downgrading it to minor > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky >Priority: Minor > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981445#comment-13981445 ] Hadoop QA commented on YARN-1063: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641970/YARN-1063.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1279 javac compiler warnings (more than the trunk's current 1278 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3630//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3630//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3630//console This message is automatically generated. > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adju
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: yarn-1964-docker.patch Trunk-patch with test passing. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1681) When "banned.users" is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981421#comment-13981421 ] Junping Du commented on YARN-1681: -- Nice catch, [~wzc1989]! Patch looks good to me. However, would you like to add a unit test in test-container-executor.c to cover this case? > When "banned.users" is not set in LCE's container-executor.cfg, submit job > with user in DEFAULT_BANNED_USERS will receive unclear error message > --- > > Key: YARN-1681 > URL: https://issues.apache.org/jira/browse/YARN-1681 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Zhichun Wu >Priority: Minor > Labels: container, usability > Attachments: YARN-1681.patch > > > When using LCE in a secure setup, if "banned.users" is not set in > container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS > ("mapred", "hdfs", "bin", 0) will receive unclear error message. > for example, if we use hdfs to submit a mr job, we may see the following the > yarn app overview page: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: > {code} > while the prefer error message may look like: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: Requested user hdfs is banned > {code} > just a minor bug and I would like to start contributing to hadoop-common with > it:) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981409#comment-13981409 ] Oleg Zhurakousky commented on YARN-1985: My theory is confirmed. After fixing my bug application finished with SUCCEEDED status which is obviously wrong. What makes it even a bigger problem IMHO is that it seems like YARN decided not to even attempt to start the other two containers which creates an interesting dilemma. How do you monitor overall application completion when: 4 Containers are allocated 2 Started and killed 2 Didn't start. Sure I can use AtomicInteger and increment/decrement it. But if I don't have any guarantee around container start attempts I may be exiting too soon. For example in my case such counter would go from 0 to 2 and then back to 0 signifying completion and as I am exiting YARN may decide to start another container > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981401#comment-13981401 ] Hadoop QA commented on YARN-1964: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641977/yarn-1964-branch-2.2.0-docker.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3631//console This message is automatically generated. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: yarn-1964-branch-2.2.0-docker.patch This should pass on the branch. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Attachment: YARN-1063.3.patch Now with more whitespace > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981332#comment-13981332 ] Oleg Zhurakousky commented on YARN-1985: Also you are saying only 3 states. What about all those LOCALIZING, LOCALZED, KILLING, ACQUIRED etc... Anyway, here is the NM log fragment: {code} 2014-04-25 12:47:45,230 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 12510 2014-04-25 12:47:45,230 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1398429077682_0006_02_03 transitioned from RUNNING to KILLING 2014-04-25 12:47:45,230 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1398429077682_0006_02_03 2014-04-25 12:47:45,630 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1398429077682_0006_02_05 by user oleg 2014-04-25 12:47:45,631 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=oleg IP=192.168.19.10OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1398429077682_0006 CONTAINERID=container_1398429077682_0006_02_05 2014-04-25 12:47:45,631 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1398429077682_0006_02_05 to application application_1398429077682_0006 2014-04-25 12:47:45,632 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1398429077682_0006_02_05 transitioned from NEW to LOCALIZING 2014-04-25 12:47:45,632 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1398429077682_0006 2014-04-25 12:47:45,634 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1398429077682_0006_02_05 transitioned from LOCALIZING to LOCALIZED 2014-04-25 12:47:45,660 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1398429077682_0006_02_05 transitioned from LOCALIZED to RUNNING 2014-04-25 12:47:45,677 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /tmp/hadoop-oleg/nm-local-dir/usercache/oleg/appcache/application_1398429077682_0006/container_1398429077682_0006_02_05/default_container_executor.sh] 2014-04-25 12:47:48,230 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1398429077682_0006_02_05 2014-04-25 12:47:48,248 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 12598 for container-id container_1398429077682_0006_02_05: 39.7 MB of 256 MB physical memory used; 1.8 GB of 537.6 MB virtual memory used 2014-04-25 12:47:48,248 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1398429077682_0006_02_05 running over twice the configured limit. Limit=563714432, current usage = 1985445888 2014-04-25 12:47:48,249 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=12598,containerID=container_1398429077682_0006_02_05] is running beyond virtual memory limits. Current usage: 39.7 MB of 256 MB physical memory used; 1.8 GB of 537.6 MB virtual memory used. Killing container. Dump of the process-tree for container_1398429077682_0006_02_05 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE . . . {code} > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981322#comment-13981322 ] Oleg Zhurakousky commented on YARN-1985: I'd agree with you by the reported status is 0. {code} ExitStatus: 0, ] {code} > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1681) When "banned.users" is not set in LCE's container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS will receive unclear error message
[ https://issues.apache.org/jira/browse/YARN-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1681: -- Labels: container usability (was: container) > When "banned.users" is not set in LCE's container-executor.cfg, submit job > with user in DEFAULT_BANNED_USERS will receive unclear error message > --- > > Key: YARN-1681 > URL: https://issues.apache.org/jira/browse/YARN-1681 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Zhichun Wu >Priority: Minor > Labels: container, usability > Attachments: YARN-1681.patch > > > When using LCE in a secure setup, if "banned.users" is not set in > container-executor.cfg, submit job with user in DEFAULT_BANNED_USERS > ("mapred", "hdfs", "bin", 0) will receive unclear error message. > for example, if we use hdfs to submit a mr job, we may see the following the > yarn app overview page: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: > {code} > while the prefer error message may look like: > {code} > appattempt_1391353981633_0003_02 exited with exitCode: -1000 due to: > Application application_1391353981633_0003 initialization failed > (exitCode=139) with output: Requested user hdfs is banned > {code} > just a minor bug and I would like to start contributing to hadoop-common with > it:) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981312#comment-13981312 ] Jason Lowe commented on YARN-1985: -- There are only three states for a container: NEW, RUNNING, or COMPLETED. Note that COMPLETED does not imply success rather that the container is no longer running. In order to discern success or failure from a completed container one must examine the exit code of the container (i.e.: the ContainerStatus#getExitStatus method). Are both containers running over their memory limits or is only one running over and somehow both are being killed? That's where the RM/NM logs would help. > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981302#comment-13981302 ] Oleg Zhurakousky commented on YARN-1985: Actually as I stated in my last comment its 2 containers. So out of four 2 were started and killed immediately and 2 were not started at all. So while I have to fix my problem of properly counting how many containers were started vs finished/running, the real issue is that such a major error condition is reported essentially as success. Basically if I didn't have a bug on my end which made my AM hang, i would probably end up seeing SUCCEEDED in the RM console, I am fixing it now and will follow up if I do see SUCCEEDED. But in any event its confusing when it simply reports COMPLETED while describing a major error. > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981296#comment-13981296 ] Jason Lowe commented on YARN-1985: -- Do you have the relevant portions of the RM log for these 4 containers showing it has marked them completed? If these all occurred on the same node, the relevant NM log would be great as well. > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981285#comment-13981285 ] Oleg Zhurakousky commented on YARN-1985: Actually, the 4 COMPLETE reports are log duplication, so it is actually two. The other two containers didn't even start. Which is fine, but the real issue is that while its clearly an ERROR condition it reports is as simple COMPLETE. > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly
[ https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1984: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 > LeveldbTimelineStore does not handle db exceptions properly > --- > > Key: YARN-1984 > URL: https://issues.apache.org/jira/browse/YARN-1984 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions > rather than IOException which can easily leak up the stack and kill threads > (e.g.: the deletion thread). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
[ https://issues.apache.org/jira/browse/YARN-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981264#comment-13981264 ] Oleg Zhurakousky commented on YARN-1985: Just adding more info. Foe my 4 container app I get a single {code} - Received completed contaners callback: [ContainerStatus: [ContainerId: container_1398429077682_0005_01_03, State: COMPLETE, Diagnostics: Container [pid=11152,containerID=container_1398429077682_0005_01_03] is running beyond virtual memory limits. Current usage: 39.6 MB of 256 MB physical memory used; 1.8 GB of 537.6 MB virtual memory used. Killing container. . . . {code} and then 4 {code} State: COMPLETE,. . . State: COMPLETE,. . . State: COMPLETE,. . . State: COMPLETE,. . . {code} > YARN issues wrong state when "running beyond virtual memory limits" > --- > > Key: YARN-1985 > URL: https://issues.apache.org/jira/browse/YARN-1985 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.3.0 >Reporter: Oleg Zhurakousky > > When deploying YARN application with multiple containers and AM determines > that the resource limits been reached (e.g., virtual memory) it starts > killing *all* containers while reporting a *single* COMPLETED status > essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1985) YARN issues wrong state when "running beyond virtual memory limits"
Oleg Zhurakousky created YARN-1985: -- Summary: YARN issues wrong state when "running beyond virtual memory limits" Key: YARN-1985 URL: https://issues.apache.org/jira/browse/YARN-1985 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Oleg Zhurakousky When deploying YARN application with multiple containers and AM determines that the resource limits been reached (e.g., virtual memory) it starts killing *all* containers while reporting a *single* COMPLETED status essentially hanging AM waiting for more containers to report its state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1941) Yarn scheduler ACL improvement
[ https://issues.apache.org/jira/browse/YARN-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981190#comment-13981190 ] Remus Rusanu commented on YARN-1941: Keeping the root at default _acl goes agains 'secure by default' principle. This is not an issue of consistency between root and and other queues, but the issue is the hierarchical check: leaf queues deffer to parent: {code} public boolean hasAccess(QueueACL acl, UserGroupInformation user) { // Check if the leaf-queue allows access synchronized (this) { if (acls.get(acl).isUserAllowed(user)) { return true; } } // Check if parent-queue allows access return getParent().hasAccess(acl, user); } {code} and parent's further deffer to their parents: {code} @Override public boolean hasAccess(QueueACL acl, UserGroupInformation user) { synchronized (this) { if (acls.get(acl).isUserAllowed(user)) { return true; } } if (parent != null) { return parent.hasAccess(acl, user); } return false; } {code} So ultimately the root ACLs cover every queue. With default being '*', all queues get access by everyone. This is a fairly bad 'default' to have. > Yarn scheduler ACL improvement > -- > > Key: YARN-1941 > URL: https://issues.apache.org/jira/browse/YARN-1941 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.3.0 >Reporter: Gordon Wang >Assignee: Gordon Wang > Labels: scheduler > > Defect: > 1. Currently, in Yarn Capacity Scheduler and Yarn Fair Scheduler, the queue > ACL is always checked when submitting a app to scheduler, regardless of the > property "yarn.acl.enable". > But for killing an app, the ACL is checked when yarn.acl.enable is set. > The behaviour is not consistent. > 2. default ACL for root queue is EVERYBODY_ACL( * ), while default ACL for > other queues is NODODY_ACL( ). From users' view, this is error prone and not > easy to understand the ACL policy of Yarn scheduler. root queue should not be > so special compared with other parent queues. > For example, if I want to set capacity scheduler ACL, the ACL of root has to > be set explicitly. Otherwise, everyone can submit APP to yarn scheduler. > Because root queue ACL is EVERYBODY_ACL. > This is hard for user to administrate yarn scheduler. > So, I propose to improve the ACL of yarn scheduler in the following aspects. > 1. only enable scheduler queue ACL when yarn.acl.enable is set to true. > 2. set the default ACL of root queue as NOBODY_ACL( ). Make all the parent > queues' ACL consistent. > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly
[ https://issues.apache.org/jira/browse/YARN-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981185#comment-13981185 ] Jason Lowe commented on YARN-1984: -- Ran across this while working with leveldb as part of MAPREDUCE-5652 and YARN-1336. There are two DBExceptions, NativeDB.DBException and leveldb.DBException. The former is derived from IOException raised by the low level JNI code, while the latter is derived from RuntimeException and is thrown by the JniDB wrapper code. To make matters worse, DBIterator throws _raw_ RuntimeException rather than the runtime DBException from its methods, so database errors can leak up the stack even if code is expecting the runtime DBException. The timeline store should be handling the runtime exceptions and treat them like I/O errors, at least to keep it from tearing down the deletion thread (if not other cases). We may want to create a wrapper utility class for DBIterator in YARN as a workaround so interacting with the database only requires handling of leveldb.DBException rather than also trying to wrestle with the raw RuntimeExceptions from the iterator. See the DBIterator wrapper class in https://issues.apache.org/jira/secure/attachment/12641927/MAPREDUCE-5652-v8.patch as a rough example. > LeveldbTimelineStore does not handle db exceptions properly > --- > > Key: YARN-1984 > URL: https://issues.apache.org/jira/browse/YARN-1984 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Jason Lowe > > The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions > rather than IOException which can easily leak up the stack and kill threads > (e.g.: the deletion thread). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1984) LeveldbTimelineStore does not handle db exceptions properly
Jason Lowe created YARN-1984: Summary: LeveldbTimelineStore does not handle db exceptions properly Key: YARN-1984 URL: https://issues.apache.org/jira/browse/YARN-1984 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jason Lowe The org.iq80.leveldb.DB and DBIterator methods throw runtime exceptions rather than IOException which can easily leak up the stack and kill threads (e.g.: the deletion thread). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1975) Used resources shows escaped html in CapacityScheduler and FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981005#comment-13981005 ] Hudson commented on YARN-1975: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1742 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1742/]) YARN-1975. Used resources shows escaped html in CapacityScheduler and FairScheduler page. Contributed by Mit Desai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589859) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Used resources shows escaped html in CapacityScheduler and FairScheduler page > - > > Key: YARN-1975 > URL: https://issues.apache.org/jira/browse/YARN-1975 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.4.0 >Reporter: Nathan Roberts >Assignee: Mit Desai > Fix For: 3.0.0, 2.4.1 > > Attachments: YARN-1975.patch, screenshot-1975.png > > > Used resources displays as <memory:, vCores;> with capacity > scheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1975) Used resources shows escaped html in CapacityScheduler and FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980994#comment-13980994 ] Hudson commented on YARN-1975: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1768 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1768/]) YARN-1975. Used resources shows escaped html in CapacityScheduler and FairScheduler page. Contributed by Mit Desai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589859) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Used resources shows escaped html in CapacityScheduler and FairScheduler page > - > > Key: YARN-1975 > URL: https://issues.apache.org/jira/browse/YARN-1975 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.4.0 >Reporter: Nathan Roberts >Assignee: Mit Desai > Fix For: 3.0.0, 2.4.1 > > Attachments: YARN-1975.patch, screenshot-1975.png > > > Used resources displays as <memory:, vCores;> with capacity > scheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1975) Used resources shows escaped html in CapacityScheduler and FairScheduler page
[ https://issues.apache.org/jira/browse/YARN-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980931#comment-13980931 ] Hudson commented on YARN-1975: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #551 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/551/]) YARN-1975. Used resources shows escaped html in CapacityScheduler and FairScheduler page. Contributed by Mit Desai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1589859) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java > Used resources shows escaped html in CapacityScheduler and FairScheduler page > - > > Key: YARN-1975 > URL: https://issues.apache.org/jira/browse/YARN-1975 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.4.0 >Reporter: Nathan Roberts >Assignee: Mit Desai > Fix For: 3.0.0, 2.4.1 > > Attachments: YARN-1975.patch, screenshot-1975.png > > > Used resources displays as <memory:, vCores;> with capacity > scheduler -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Labels: security windows (was: ) > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Component/s: nodemanager > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Attachment: YARN-1972.1.patch Iteration 1. I will upload a short design soon to make the dry code a more palatable read. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980754#comment-13980754 ] Remus Rusanu commented on YARN-1063: patch applies fine on trunk for me. Not sure why it failed for Mr. Jenkins. I removed the trunk-win tags. > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Affects Version/s: (was: trunk-win) > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu reassigned YARN-1063: -- Assignee: Remus Rusanu > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Labels: security windows (was: security) > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Target Version/s: (was: trunk-win) > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1063: --- Fix Version/s: (was: trunk-win) > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.2#6252)