[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: YARN-1964.patch Harmonized changes between yarn-default.xml and YarnConfiguration. Updated docs. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154522#comment-14154522 ] Hadoop QA commented on YARN-1964: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672275/YARN-1964.patch against trunk revision 17d1202. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5194//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5194//console This message is automatically generated. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization
[ https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154687#comment-14154687 ] Hudson commented on YARN-2387: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java > Resource Manager crashes with NPE due to lack of synchronization > > > Key: YARN-2387 > URL: https://issues.apache.org/jira/browse/YARN-2387 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch > > > We recently came across a 0.23 RM crashing with an NPE. Here is the > stacktrace for it. > {noformat} > 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34) > at > org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339) > at java.lang.Thread.run(Thread.java:722) > 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} > On investigating a on the issue we found that the ContainerStatusPBImpl has > methods that are called by different threads and are not synchronized. Even > the 2.X code looks alike. > We need to make these methods synchronized so that we do not encounter this > problem in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2610) Hamlet should close table tags
[ https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154684#comment-14154684 ] Hudson commented on YARN-2610: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev f7743dd07dfbe0dde9be71acfaba16ded52adba7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java > Hamlet should close table tags > -- > > Key: YARN-2610 > URL: https://issues.apache.org/jira/browse/YARN-2610 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.6.0 > > Attachments: YARN-2610-01.patch, YARN-2610-02.patch > > > Revisiting a subset of MAPREDUCE-2993. > The , , , , tags are not configured to close > properly in Hamlet. While this is allowed in HTML 4.01, missing closing > table tags tends to wreak havoc with a lot of HTML processors (although not > usually browsers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154679#comment-14154679 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/bin/yarn > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo >Priority: Critical > Attachments: YARN-1492-all-trunk-v1.patch, > YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, > YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, > shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport
[ https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154690#comment-14154690 ] Hudson commented on YARN-2594: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2594. Potential deadlock in RM when querying ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 14d60dadc25b044a2887bf912ba5872367f2dffb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java > Potential deadlock in RM when querying ApplicationResourceUsageReport > - > > Key: YARN-2594 > URL: https://issues.apache.org/jira/browse/YARN-2594 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karam Singh >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch > > > ResoruceManager sometimes become un-responsive: > There was in exception in ResourceManager log and contains only following > type of messages: > {code} > 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000 > 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000 > 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000 > 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000 > 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000 > 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000 > 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154680#comment-14154680 ] Hudson commented on YARN-2179: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/bin/yarn > Initial cache manager structure and context > --- > > Key: YARN-2179 > URL: https://issues.apache.org/jira/browse/YARN-2179 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, > YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, > YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, > YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch > > > Implement the initial shared cache manager structure and context. The > SCMContext will be used by a number of manager services (i.e. the backing > store and the cleaner service). The AppChecker is used to gather the > currently running applications on SCM startup (necessary for an scm that is > backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE
[ https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154686#comment-14154686 ] Hudson commented on YARN-2602: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. Contributed by Zhijie Shen (jianhe: rev bbff96be48119774688981d04baf444639135977) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java > Generic History Service of TimelineServer sometimes not able to handle NPE > -- > > Key: YARN-2602 > URL: https://issues.apache.org/jira/browse/YARN-2602 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 > Environment: ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 days, with many random example jobs running >Reporter: Karam Singh >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2602.1.patch > > > ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 day, with many random example jobs running . > When I ran WS API for AHS/GHS: > {code} > curl --negotiate -u : > 'http:///v1/applicationhistory/apps/application_1411579118376_0001' > {code} > It ran successfully. > However > {code} > curl --negotiate -u : > 'http:///ws/v1/applicationhistory/apps' > {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"} > {code} > Failed with Internal server error 500. > After looking at TimelineServer logs found that there was NPE: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled
[ https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154691#comment-14154691 ] Hudson commented on YARN-2627: -- FAILURE: Integrated in Hadoop-Yarn-trunk #697 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/697/]) YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 9582a50176800433ad3fa8829a50c28b859812a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > Add logs when attemptFailuresValidityInterval is enabled > > > Key: YARN-2627 > URL: https://issues.apache.org/jira/browse/YARN-2627 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2627.1.patch, YARN-2627.2.patch > > > After YARN-611, users can specify attemptFailuresValidityInterval for their > applications. This is for testing/debug purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2610) Hamlet should close table tags
[ https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154839#comment-14154839 ] Hudson commented on YARN-2610: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev f7743dd07dfbe0dde9be71acfaba16ded52adba7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java * hadoop-yarn-project/CHANGES.txt > Hamlet should close table tags > -- > > Key: YARN-2610 > URL: https://issues.apache.org/jira/browse/YARN-2610 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.6.0 > > Attachments: YARN-2610-01.patch, YARN-2610-02.patch > > > Revisiting a subset of MAPREDUCE-2993. > The , , , , tags are not configured to close > properly in Hamlet. While this is allowed in HTML 4.01, missing closing > table tags tends to wreak havoc with a lot of HTML processors (although not > usually browsers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154833#comment-14154833 ] Hudson commented on YARN-1492: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/CHANGES.txt > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo >Priority: Critical > Attachments: YARN-1492-all-trunk-v1.patch, > YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, > YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, > shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE
[ https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154841#comment-14154841 ] Hudson commented on YARN-2602: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. Contributed by Zhijie Shen (jianhe: rev bbff96be48119774688981d04baf444639135977) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt > Generic History Service of TimelineServer sometimes not able to handle NPE > -- > > Key: YARN-2602 > URL: https://issues.apache.org/jira/browse/YARN-2602 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 > Environment: ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 days, with many random example jobs running >Reporter: Karam Singh >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2602.1.patch > > > ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 day, with many random example jobs running . > When I ran WS API for AHS/GHS: > {code} > curl --negotiate -u : > 'http:///v1/applicationhistory/apps/application_1411579118376_0001' > {code} > It ran successfully. > However > {code} > curl --negotiate -u : > 'http:///ws/v1/applicationhistory/apps' > {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"} > {code} > Failed with Internal server error 500. > After looking at TimelineServer logs found that there was NPE: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport
[ https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154845#comment-14154845 ] Hudson commented on YARN-2594: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2594. Potential deadlock in RM when querying ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 14d60dadc25b044a2887bf912ba5872367f2dffb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > Potential deadlock in RM when querying ApplicationResourceUsageReport > - > > Key: YARN-2594 > URL: https://issues.apache.org/jira/browse/YARN-2594 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karam Singh >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch > > > ResoruceManager sometimes become un-responsive: > There was in exception in ResourceManager log and contains only following > type of messages: > {code} > 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000 > 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000 > 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000 > 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000 > 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000 > 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000 > 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154834#comment-14154834 ] Hudson commented on YARN-2179: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/CHANGES.txt > Initial cache manager structure and context > --- > > Key: YARN-2179 > URL: https://issues.apache.org/jira/browse/YARN-2179 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, > YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, > YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, > YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch > > > Implement the initial shared cache manager structure and context. The > SCMContext will be used by a number of manager services (i.e. the backing > store and the cleaner service). The AppChecker is used to gather the > currently running applications on SCM startup (necessary for an scm that is > backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled
[ https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154846#comment-14154846 ] Hudson commented on YARN-2627: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 9582a50176800433ad3fa8829a50c28b859812a3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java > Add logs when attemptFailuresValidityInterval is enabled > > > Key: YARN-2627 > URL: https://issues.apache.org/jira/browse/YARN-2627 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2627.1.patch, YARN-2627.2.patch > > > After YARN-611, users can specify attemptFailuresValidityInterval for their > applications. This is for testing/debug purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization
[ https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154842#comment-14154842 ] Hudson commented on YARN-2387: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1888 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1888/]) YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java > Resource Manager crashes with NPE due to lack of synchronization > > > Key: YARN-2387 > URL: https://issues.apache.org/jira/browse/YARN-2387 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch > > > We recently came across a 0.23 RM crashing with an NPE. Here is the > stacktrace for it. > {noformat} > 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34) > at > org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339) > at java.lang.Thread.run(Thread.java:722) > 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} > On investigating a on the issue we found that the ContainerStatusPBImpl has > methods that are called by different threads and are not synchronized. Even > the 2.X code looks alike. > We need to make these methods synchronized so that we do not encounter this > problem in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154851#comment-14154851 ] Jason Lowe commented on YARN-2179: -- The pom versions are incorrect in branch-2 from the cherry-pick. The pom says 3.0.0-SNAPSHOT, but it needs to be 2.6.0-SNAPSHOT in branch-2. > Initial cache manager structure and context > --- > > Key: YARN-2179 > URL: https://issues.apache.org/jira/browse/YARN-2179 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, > YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, > YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, > YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch > > > Implement the initial shared cache manager structure and context. The > SCMContext will be used by a number of manager services (i.e. the backing > store and the cleaner service). The AppChecker is used to gather the > currently running applications on SCM startup (necessary for an scm that is > backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2633) TestContainerLauncherImpl sometimes fails
Mit Desai created YARN-2633: --- Summary: TestContainerLauncherImpl sometimes fails Key: YARN-2633 URL: https://issues.apache.org/jira/browse/YARN-2633 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Assignee: Mit Desai {noformat} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close() at java.lang.Class.getMethod(Class.java:1665) at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154872#comment-14154872 ] Karthik Kambatla commented on YARN-2179: Thanks for catching it, Jason. Just pushed another commit fixing the pom version in sharedcachemanager. > Initial cache manager structure and context > --- > > Key: YARN-2179 > URL: https://issues.apache.org/jira/browse/YARN-2179 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, > YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, > YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, > YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch > > > Implement the initial shared cache manager structure and context. The > SCMContext will be used by a number of manager services (i.e. the backing > store and the cleaner service). The AppChecker is used to gather the > currently running applications on SCM startup (necessary for an scm that is > backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2627) Add logs when attemptFailuresValidityInterval is enabled
[ https://issues.apache.org/jira/browse/YARN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154914#comment-14154914 ] Hudson commented on YARN-2627: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2627. Added the info logs of attemptFailuresValidityInterval and number of previous failed attempts. Contributed by Xuan Gong. (zjshen: rev 9582a50176800433ad3fa8829a50c28b859812a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > Add logs when attemptFailuresValidityInterval is enabled > > > Key: YARN-2627 > URL: https://issues.apache.org/jira/browse/YARN-2627 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2627.1.patch, YARN-2627.2.patch > > > After YARN-611, users can specify attemptFailuresValidityInterval for their > applications. This is for testing/debug purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154903#comment-14154903 ] Hudson commented on YARN-2179: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt > Initial cache manager structure and context > --- > > Key: YARN-2179 > URL: https://issues.apache.org/jira/browse/YARN-2179 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v10.patch, > YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, > YARN-2179-trunk-v5.patch, YARN-2179-trunk-v6.patch, YARN-2179-trunk-v7.patch, > YARN-2179-trunk-v8.patch, YARN-2179-trunk-v9.patch > > > Implement the initial shared cache manager structure and context. The > SCMContext will be used by a number of manager services (i.e. the backing > store and the cleaner service). The AppChecker is used to gather the > currently running applications on SCM startup (necessary for an scm that is > backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2387) Resource Manager crashes with NPE due to lack of synchronization
[ https://issues.apache.org/jira/browse/YARN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154910#comment-14154910 ] Hudson commented on YARN-2387: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2387. Resource Manager crashes with NPE due to lack of synchronization. Contributed by Mit Desai (jlowe: rev feaf139b4f327d33011e5a4424c06fb44c630955) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerStatusPBImpl.java > Resource Manager crashes with NPE due to lack of synchronization > > > Key: YARN-2387 > URL: https://issues.apache.org/jira/browse/YARN-2387 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 2.5.0 >Reporter: Mit Desai >Assignee: Mit Desai >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2387.patch, YARN-2387.patch, YARN-2387.patch > > > We recently came across a 0.23 RM crashing with an NPE. Here is the > stacktrace for it. > {noformat} > 2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34) > at > org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353) > at java.lang.String.valueOf(String.java:2854) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339) > at java.lang.Thread.run(Thread.java:722) > 2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. > {noformat} > On investigating a on the issue we found that the ContainerStatusPBImpl has > methods that are called by different threads and are not synchronized. Even > the 2.X code looks alike. > We need to make these methods synchronized so that we do not encounter this > problem in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154902#comment-14154902 ] Hudson commented on YARN-1492: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2179. [YARN-1492] Initial cache manager structure and context. (Chris Trezzo via kasha) (kasha: rev 17d1202c35a1992eab66ea05dfd2baf219a17aec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestRemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/bin/yarn * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/sharedcache/SharedCacheStructureUtil.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/AppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/RemoteAppChecker.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo >Priority: Critical > Attachments: YARN-1492-all-trunk-v1.patch, > YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, > YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, > shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2610) Hamlet should close table tags
[ https://issues.apache.org/jira/browse/YARN-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154907#comment-14154907 ] Hudson commented on YARN-2610: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2610. Hamlet should close table tags. (Ray Chiang via kasha) (kasha: rev f7743dd07dfbe0dde9be71acfaba16ded52adba7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/hamlet/TestHamlet.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/hamlet/Hamlet.java > Hamlet should close table tags > -- > > Key: YARN-2610 > URL: https://issues.apache.org/jira/browse/YARN-2610 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.6.0 > > Attachments: YARN-2610-01.patch, YARN-2610-02.patch > > > Revisiting a subset of MAPREDUCE-2993. > The , , , , tags are not configured to close > properly in Hamlet. While this is allowed in HTML 4.01, missing closing > table tags tends to wreak havoc with a lot of HTML processors (although not > usually browsers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2602) Generic History Service of TimelineServer sometimes not able to handle NPE
[ https://issues.apache.org/jira/browse/YARN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154909#comment-14154909 ] Hudson commented on YARN-2602: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2602. Fixed possible NPE in ApplicationHistoryManagerOnTimelineStore. Contributed by Zhijie Shen (jianhe: rev bbff96be48119774688981d04baf444639135977) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java > Generic History Service of TimelineServer sometimes not able to handle NPE > -- > > Key: YARN-2602 > URL: https://issues.apache.org/jira/browse/YARN-2602 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 > Environment: ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 days, with many random example jobs running >Reporter: Karam Singh >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2602.1.patch > > > ATS is running with AHS/GHS enabled to use TimelineStore. > Running for 4-5 day, with many random example jobs running . > When I ran WS API for AHS/GHS: > {code} > curl --negotiate -u : > 'http:///v1/applicationhistory/apps/application_1411579118376_0001' > {code} > It ran successfully. > However > {code} > curl --negotiate -u : > 'http:///ws/v1/applicationhistory/apps' > {"exception":"WebApplicationException","message":"java.lang.NullPointerException","javaClassName":"javax.ws.rs.WebApplicationException"} > {code} > Failed with Internal server error 500. > After looking at TimelineServer logs found that there was NPE: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2594) Potential deadlock in RM when querying ApplicationResourceUsageReport
[ https://issues.apache.org/jira/browse/YARN-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154913#comment-14154913 ] Hudson commented on YARN-2594: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1913 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1913/]) YARN-2594. Potential deadlock in RM when querying ApplicationResourceUsageReport. (Wangda Tan via kasha) (kasha: rev 14d60dadc25b044a2887bf912ba5872367f2dffb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/CHANGES.txt > Potential deadlock in RM when querying ApplicationResourceUsageReport > - > > Key: YARN-2594 > URL: https://issues.apache.org/jira/browse/YARN-2594 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karam Singh >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2594.patch, YARN-2594.patch, YARN-2594.patch > > > ResoruceManager sometimes become un-responsive: > There was in exception in ResourceManager log and contains only following > type of messages: > {code} > 2014-09-19 19:13:45,241 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 53000 > 2014-09-19 19:30:26,312 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 54000 > 2014-09-19 19:47:07,351 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 55000 > 2014-09-19 20:03:48,460 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 56000 > 2014-09-19 20:20:29,542 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 57000 > 2014-09-19 20:37:10,635 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 58000 > 2014-09-19 20:53:51,722 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(232)) - Size of event-queue is 59000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades
[ https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154931#comment-14154931 ] Junping Du commented on YARN-2613: -- +1. Patch looks good to me. Will commit it shortly. > NMClient doesn't have retries for supporting rolling-upgrades > - > > Key: YARN-2613 > URL: https://issues.apache.org/jira/browse/YARN-2613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch > > > While NM is rolling upgrade, client should retry NM until it comes up. This > jira is to add a NMProxy (similar to RMProxy) with retry implementation to > support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154934#comment-14154934 ] Hadoop QA commented on YARN-2180: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672206/YARN-2180-trunk-v6.patch against trunk revision 17d1202. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5195//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5195//console This message is automatically generated. > In-memory backing store for cache manager > - > > Key: YARN-2180 > URL: https://issues.apache.org/jira/browse/YARN-2180 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, > YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, > YARN-2180-trunk-v6.patch > > > Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2634) Test failure for TestClientRMTokens
Junping Du created YARN-2634: Summary: Test failure for TestClientRMTokens Key: YARN-2634 URL: https://issues.apache.org/jira/browse/YARN-2634 Project: Hadoop YARN Issue Type: Test Reporter: Junping Du The test get failed as below: {noformat} --- Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens --- Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 22.693 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272) testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 20.087 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283) testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 0.031 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241) testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 0.061 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261) testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) Time elapsed: 0.07 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684) at org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149) 1,1 Top {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2634) Test failure for TestClientRMTokens
[ https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2634: - Target Version/s: 2.6.0 > Test failure for TestClientRMTokens > --- > > Key: YARN-2634 > URL: https://issues.apache.org/jira/browse/YARN-2634 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du > > The test get failed as below: > {noformat} > --- > Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > --- > Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 22.693 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272) > testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 20.087 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283) > testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241) > testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.061 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261) > testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.07 sec <<< ERROR! > java.lang.NullPointerException: null > at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149) > > >1,1 Top > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2545) RMApp should transit to FAILED when AM calls finishApplicationMaster with FAILED
[ https://issues.apache.org/jira/browse/YARN-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154946#comment-14154946 ] Hong Zhiguo commented on YARN-2545: --- How about the state of appAttempt? should it finally be FAILED instead of FINISHED? > RMApp should transit to FAILED when AM calls finishApplicationMaster with > FAILED > > > Key: YARN-2545 > URL: https://issues.apache.org/jira/browse/YARN-2545 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > > If AM calls finishApplicationMaster with getFinalApplicationStatus()==FAILED, > and then exits, the corresponding RMApp and RMAppAttempt transit to state > FINISHED. > I think this is wrong and confusing. On RM WebUI, this application is > displayed as "State=FINISHED, FinalStatus=FAILED", and is counted as "Apps > Completed", not as "Apps Failed". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2615: - Attachment: YARN-2615.patch Upload the first patch, include the changes on ClientToAMTokenIdentifier (and test), RMDelegationTokenIdentifier and TimelineDelegationTokenIdentifier. The compatibility tests for RMDelegationTokenIdentifier haven't been completed due to test failures on TestClientRMTokens failed on trunk (without code here), filed YARN-2634 to fix it before get test in. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2634) Test failure for TestClientRMTokens
[ https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2634: - Priority: Blocker (was: Major) > Test failure for TestClientRMTokens > --- > > Key: YARN-2634 > URL: https://issues.apache.org/jira/browse/YARN-2634 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Priority: Blocker > > The test get failed as below: > {noformat} > --- > Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > --- > Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 22.693 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272) > testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 20.087 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283) > testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241) > testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.061 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261) > testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.07 sec <<< ERROR! > java.lang.NullPointerException: null > at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149) > > >1,1 Top > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-015.patch parch -15; this is patch -14 rebased against trunk with a conflict fixed > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, yarnregistry.pdf, > yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155093#comment-14155093 ] Vinod Kumar Vavilapalli commented on YARN-1063: --- Tx for the updates [~rusanu]! I am committing this now to unblock the follow up patches, trusting [~ivanmi]'s reviews on the Windows side of things. > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, > YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155098#comment-14155098 ] Steve Loughran commented on YARN-2616: -- thanks. I'm going to pull this down into the main YARN-913 patch & sync up with changes, but will then post the patch here for it to be reviewed/completed in isolation. # I'll set things up for tests to go in, though I won't do the tests...I'll leave that as half the challenge. # Here's my evolving [[Updated Hadoop style guide|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]] > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2624: --- Priority: Blocker (was: Major) Target Version/s: 2.6.0 Affects Version/s: 2.5.1 > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2632) Document NM Restart feature
[ https://issues.apache.org/jira/browse/YARN-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2632: --- Priority: Blocker (was: Major) Marking this a blocker to ensure we don't miss it in 2.6. > Document NM Restart feature > --- > > Key: YARN-2632 > URL: https://issues.apache.org/jira/browse/YARN-2632 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Junping Du >Priority: Blocker > > As a new feature to YARN, we should document this feature's behavior, > configuration, and things to pay attention. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1972: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-732 > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, > YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` > and set the `yarn.nodemanager.windows-secure-container-executor.group` to a > Windows security group name that is the nodemanager service principal is a > member of (equivalent of LCE > `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE > does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is > member of the local Administrators group or LocalSystem. this is derived from > the need to invoke LoadUserProfile API which mention these requirements in > the specifications. This is in addition to the SE_TCB privilege mentioned in > YARN-1063, but this requirement will automatically imply that the SE_TCB > privilege is held by the nodemanager. For the Linux speakers in the audience, > the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to > isolate the high privilege operations into a separate process, an 'executor' > service that is solely responsible to start the containers (incloding the > localizer). The NM would have to authenticate, authorize and communicate with > this service via an IPC mechanism and use this service to launch the > containers. I still believe we'll end up deploying such a service, but the > effort to onboard such a new platfrom specific new service on the project are > not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-732) YARN support for container isolation on Windows
[ https://issues.apache.org/jira/browse/YARN-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-732: - Fix Version/s: (was: trunk-win) > YARN support for container isolation on Windows > --- > > Key: YARN-732 > URL: https://issues.apache.org/jira/browse/YARN-732 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: trunk-win >Reporter: Kyle Leckie > Labels: security > Attachments: winutils.diff > > > There is no ContainerExecutor on windows that can launch containers in a > manner that creates: > 1) container isolation > 2) container execution with reduced rights > I am working on patches that will add the ability to launch containers in a > process with a reduced access token. > Update: After examining several approaches I have settled on launching the > task as a domain user. I have attached the current winutils diff which is a > work in progress. > Work remaining: > - Create isolated desktop for task processes. > - Set integrity of spawned processed to low. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2129) Add scheduling priority to the WindowsSecureContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2129: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-732 > Add scheduling priority to the WindowsSecureContainerExecutor > - > > Key: YARN-2129 > URL: https://issues.apache.org/jira/browse/YARN-2129 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-2129.1.patch, YARN-2129.2.patch > > > The WCE (YARN-1972) could and should honor > NM_CONTAINER_EXECUTOR_SCHED_PRIORITY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155116#comment-14155116 ] Hudson commented on YARN-1063: -- FAILURE: Integrated in Hadoop-trunk-Commit #6164 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6164/]) YARN-1063. Augmented Hadoop common winutils to have the ability to create containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c * hadoop-common-project/hadoop-common/src/main/winutils/symlink.c * hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h * hadoop-common-project/hadoop-common/src/main/winutils/task.c * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java * hadoop-common-project/hadoop-common/src/main/winutils/chown.c > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, > YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155127#comment-14155127 ] Vinod Kumar Vavilapalli commented on YARN-1972: --- bq. Remus Rusanu Vinod Kumar Vavilapalli, as on YARN-1063, we can go ahead and address these comments as part of the YARN-2198 effort, it's not necessary to resolve these before these patches are committed. +1 for tracking the remaining issues at YARN-1063. This looks good, checking this in. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, > YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` > and set the `yarn.nodemanager.windows-secure-container-executor.group` to a > Windows security group name that is the nodemanager service principal is a > member of (equivalent of LCE > `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE > does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is > member of the local Administrators group or LocalSystem. this is derived from > the need to invoke LoadUserProfile API which mention these requirements in > the specifications. This is in addition to the SE_TCB privilege mentioned in > YARN-1063, but this requirement will automatically imply that the SE_TCB > privilege is held by the nodemanager. For the Linux speakers in the audience, > the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to > isolate the high privilege operations into a separate process, an 'executor' > service that is solely responsible to start the containers (incloding the > localizer). The NM would have to authenticate, authorize and communicate with > this service via an IPC mechanism and use this service to launch the > containers. I still believe we'll end up deploying such a service, but the > effort to onboard such a new platfrom specific new service on the project are > not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155136#comment-14155136 ] Zhijie Shen commented on YARN-2630: --- Make sense. +1 > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155139#comment-14155139 ] Hadoop QA commented on YARN-2615: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672344/YARN-2615.patch against trunk revision 3f25d91. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.crypto.random.TestOsSecureRandom org.apache.hadoop.ha.TestZKFailoverControllerStress org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5196//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5196//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5196//console This message is automatically generated. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2635) TestRMRestart fails with FairScheduler
Wei Yan created YARN-2635: - Summary: TestRMRestart fails with FairScheduler Key: YARN-2635 URL: https://issues.apache.org/jira/browse/YARN-2635 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan If we change the scheduler from Capacity Scheduler to Fair Scheduler, the TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155151#comment-14155151 ] Hudson commented on YARN-1972: -- FAILURE: Integrated in Hadoop-trunk-Commit #6165 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6165/]) YARN-1972. Added a secure container-executor for Windows. Contributed by Remus Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, > YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor`
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155169#comment-14155169 ] Hadoop QA commented on YARN-1063: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657587/YARN-1063.6.patch against trunk revision 04b0843. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5197//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5197//console This message is automatically generated. > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, > YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_Q
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1879: Attachment: YARN-1879.16.patch [~ozawa] I have updated your patch to compile with latest trunk. [~jianhe] can you please take a look > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2630: -- Attachment: YARN-2630.3.patch Uploaded a patch which renames NodeHeartbeatResponse#getFinishedContainersPulledByAM to getContainersToBeRemovedFromNM, as I think if in the future we add one more channel (not just pulled by AM) to remove containers from NM, the latter is more semantically correct. > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155220#comment-14155220 ] Jian He commented on YARN-2617: --- looks good, +1 > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155227#comment-14155227 ] Zhijie Shen commented on YARN-2630: --- Would you please check "finishedContainersPulledByAM" is completely replaced in the code base? {code} -if (this.finishedContainersPulledByAM != null) { +if (this.containersToBeRemovedFromNM != null) { addFinishedContainersPulledByAMToProto(); } {code} {code} - public void addFinishedContainersPulledByAM( + public void addContainersToBeRemovedFromNM( final List finishedContainersPulledByAM) { if (finishedContainersPulledByAM == null) return; initFinishedContainersPulledByAM(); -this.finishedContainersPulledByAM.addAll(finishedContainersPulledByAM); +this.containersToBeRemovedFromNM.addAll(finishedContainersPulledByAM); {code} {code} - nhResponse.addFinishedContainersPulledByAM(finishedContainersPulledByAM); + nhResponse.addContainersToBeRemovedFromNM(finishedContainersPulledByAM); {code} {code} - response.addFinishedContainersPulledByAM( + response.addContainersToBeRemovedFromNM( new ArrayList(this.finishedContainersPulledByAM)); {code} > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155235#comment-14155235 ] Jian Fang commented on YARN-1680: - I may be wrong because I don't understand the logic fully. Seems your patch calculates the blacklisted resource for each application. Please clarify for me whether the blacklisted node is a cluster level concept or an application level one. What if multiple applications have different sets of blacklisted nodes? If the blacklisted node is at the cluster level, the blacklisted resource seems should be calculated at the cluster level, that is to say, you need to get the blacklisted nodes from other applications as well. If it is only at the application level, I wonder how the blacklist-task-tracker command works in hadoop one. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2630: -- Attachment: YARN-2630.4.patch thanks zhijie ! updated the patch to fix the inconsistency > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1972: -- Attachment: YARN-1972.delta.5-branch-2.patch The patch doesn't apply on branch-2. Generated it myself, attaching now. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, > YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` > and set the `yarn.nodemanager.windows-secure-container-executor.group` to a > Windows security group name that is the nodemanager service principal is a > member of (equivalent of LCE > `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE > does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is > member of the local Administrators group or LocalSystem. this is derived from > the need to invoke LoadUserProfile API which mention these requirements in > the specifications. This is in addition to the SE_TCB privilege mentioned in > YARN-1063, but this requirement will automatically imply that the SE_TCB > privilege is held by the nodemanager. For the Linux speakers in the audience, > the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to > isolate the high privilege operations into a separate process, an 'executor' > service that is solely responsible to start the containers (incloding the > localizer). The NM would have to authenticate, authorize and communicate with > this service via an IPC mechanism and use this service to launch the > containers. I still believe we'll end up deploying such a service, but the > effort to onboard such a new platfrom specific new service on the project are > not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155259#comment-14155259 ] Hadoop QA commented on YARN-1972: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672375/YARN-1972.delta.5-branch-2.patch against trunk revision 1f5b42a. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5200//console This message is automatically generated. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, > YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` > and set the `yarn.nodemanager.windows-secure-container-executor.group` to a > Windows security group name that is the nodemanager service principal is a > member of (equivalent of LCE > `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE > does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is > member of the local Administrators group or LocalSystem. this is derived from > the need to invoke LoadUserProfile API which mention these requirements in > the specifications. This is in addition to the SE_TCB privilege mentioned in > YARN-1063, but this requirement will automatically imply that the SE_TCB > privilege is held by the nodemanager. For the Linux speakers in the audience, > the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to > isolate the high privilege operations into a separate process, an 'executor' > service that is solely responsible to start the containers (incloding the > localizer). The NM would have to authenticate, authorize and communicate with > this service via an IPC mechanism and use this service to launch the > containers. I still believe we'll end up deploying such a service, but the > effort to onboard such a new platfrom specific new service on the project are > not trivial. -- This message was sent by Atlassian JIRA (v6.3.4
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155282#comment-14155282 ] Jian Fang commented on YARN-1680: - Also, seems the variable blackListedResources in SchedulerApplicationAttempt is not initialized in YARN-1680-WIP.patch and it causes NPE. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2628: Attachment: apache-yarn-2628.0.patch Uploaded a patch with fix and test case. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155295#comment-14155295 ] Craig Welch commented on YARN-1680: --- As I recall, blacklisted nodes are application level > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155302#comment-14155302 ] Hadoop QA commented on YARN-1879: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672365/YARN-1879.16.patch against trunk revision 737f280. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5198//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5198//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5198//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155318#comment-14155318 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672368/YARN-2630.3.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5199//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5199//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5199//console This message is automatically generated. > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155344#comment-14155344 ] Jian Fang commented on YARN-1680: - Is there any behavior change from hadoop one to hadoop two for the blacklist node? Seems HADOOP-5643 discussed the ability to blacklist tasktracker. We have a use case to blacklist a node at the cluster level before decommission the node so as to gracefully remove this node. If the blacklist is only application level, then we have to figure out something else. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155352#comment-14155352 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5201//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5201//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5201//console This message is automatically generated. > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2636) Windows Secure Container Executor: add unit tests for WSCE
Remus Rusanu created YARN-2636: -- Summary: Windows Secure Container Executor: add unit tests for WSCE Key: YARN-2636 URL: https://issues.apache.org/jira/browse/YARN-2636 Project: Hadoop YARN Issue Type: Sub-task Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical As title says. The WSCE has no check-in unit tests. Much of the functionality depends on elevated hadoopwinutilsvc service and cannot be tested, but lets test what is possible to be mocked in Java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155360#comment-14155360 ] Craig Welch commented on YARN-1680: --- There are different kinds of blacklisting, the one at issue in this jira is the application level one. The cluster level one ends up with the node's resource value being removed from the cluster resource and it doesn't need to be addressed here (because removing it from the cluster resource removes it's resource amount from any headroom calculation already), this is to address the application level blacklist, which needs to be handled at this level. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Description: I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API (a JSON counterpart is also available): {code:xml} 7680 7 application_1412191664217_0001 appattempt_1412191664217_0001_01 default 6144 6 3 1024 1 6 true 20 localMachine /default-rack * ... {code} was: I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API: {code:xml} 96256 94 application_ appattempt_ default 96256 94 3 1024 1 /default-rack 94 true 20 1024 1 * 94 true 20 1024 1 master 94 true 20 {code} > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-3.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: YARN-2408.4.patch Clustered resource requests that have the same priority, same number of containers, same relax locality, and same number of cores. > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408.4.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: (was: YARN-2408-3.patch) > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408.4.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155373#comment-14155373 ] Remus Rusanu commented on YARN-1063: Contributor credit should also got to Kyle Leckie > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, > YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. > # Load/Create a profile on the local machine for the new logon. > # Create a new environment for the new logon. > # Launch the new process in a job with the task name specified and using the > created logon. > # Wait for the JOB to exit. > h2. Future work: > The following work was scoped out of this check in: > * Support for non-domain users or machine that are not domain joined. > * Support for privilege isolation by running the task launcher in a high > privilege service with access over an ACLed named pipe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155372#comment-14155372 ] Hadoop QA commented on YARN-2408: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672388/YARN-2408.4.patch against trunk revision 1f5b42a. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5203//console This message is automatically generated. > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408.4.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155381#comment-14155381 ] Jian Fang commented on YARN-1680: - Thanks Craig for your clarification. Is the cluster level blacklisted node called an unhealthy node? I checked Hadoop two code, but only found the cluster level blacklist related to the parameters such as yarn.nodemanager.health-checker.script.path. Are there any other code paths for the cluster level blacklist in hadoop two? > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155393#comment-14155393 ] Hadoop QA commented on YARN-2628: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672381/apache-yarn-2628.0.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5202//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5202//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5202//console This message is automatically generated. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2617: -- Attachment: YARN-2617.5.patch just added one more log statement myself, pending jenkins > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155399#comment-14155399 ] Tsuyoshi OZAWA commented on YARN-1879: -- Sorry for the delay and thanks for updating the patch, [~adhoot]. About the test failure, it looks not related to the patch. Let me attach the patch which includes your comment changes. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155398#comment-14155398 ] Varun Vasudev commented on YARN-2628: - The release audit error is from a hdfs file and unrelated. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155400#comment-14155400 ] Tsuyoshi OZAWA commented on YARN-1879: -- About the release audit warning, it's also not related. {quote} !? /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes Lines that start with ? in the release audit report indicate files that do not have an Apache license header {quote} > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155407#comment-14155407 ] Tsuyoshi OZAWA commented on YARN-1879: -- {quote} >APIs that added trigger flag. APIs that added Idempotent/AtOnce annotation? {quote} I think ">APIs that are added trigger flag." is correct, so updating it. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.17.patch > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2312: - Attachment: YARN-2312.2-3.patch > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155467#comment-14155467 ] Karthik Kambatla commented on YARN-2254: Patch looks mostly good. One nit: Can we rename ALLOC_FILE to FS_ALLOC_FILE and "test-queues.xml" to "test-fs-queues.xml" to clarify the files are used only for FairScheduler? > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155473#comment-14155473 ] Tsuyoshi OZAWA commented on YARN-2312: -- I cannot reproduce the findbugs warning. Let me check the reason on Jenkins. > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155477#comment-14155477 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5205//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5205//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5205//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155481#comment-14155481 ] Hadoop QA commented on YARN-1879: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672394/YARN-1879.17.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5206//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-016.patch patch -016: includes registry cli patch (-002) of YARN-2616 > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155530#comment-14155530 ] Steve Loughran commented on YARN-2616: -- the patch I just posted doesn't {{stop()}} the registry service, so will leak a curator instance/threads. > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2616: - Attachment: YARN-2616-003.patch > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, > yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2254: Attachment: YARN-2254.004.patch > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1418#comment-1418 ] zhihai xu commented on YARN-2254: - Hi [~kasha], Good suggestion, I upload a new patch YARN-2254.004.patch to address the comments. thanks > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.18.patch Rebased on trunk. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, > YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155565#comment-14155565 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5204//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5204//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5204//console This message is automatically generated. > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155571#comment-14155571 ] Karthik Kambatla commented on YARN-2254: +1, pending Jenkins. I ll commit this later today. > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155584#comment-14155584 ] Jian He commented on YARN-2628: --- looks good, one minor comment in the test case: - the following assertion depends on timing, as the allocation happens asynchronously, it might fail. could you use a loop to check if the container is allocated, otherwise timeout. {code} Thread.sleep(1000); allocResponse = am1.schedule(); Assert.assertEquals(1, allocResponse.getAllocatedContainers().size()); {code} > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155624#comment-14155624 ] Steve Loughran commented on YARN-2616: -- features of 003 patch # registry instance created via factory # uses configuration instance built up on command line (though it is also creating a {{YarnConfiguration()}} around that. # pulls out all exception-to-error-text mapping to single method # covered the current set of errors # and also log @ debug if enabled. > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, > yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155638#comment-14155638 ] Joep Rottinghuis commented on YARN-1414: @sandyr could we get some love on this jira ? We're essentially running with a forked Fairscheduler and would like to reduce tech-debt each time we uprev to a newer version. > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155643#comment-14155643 ] Zhijie Shen commented on YARN-2583: --- Per discussion offline: 1. In AggregatedLogDeletionService of JHS, we delete the log files of completed app, and in AppLogAggregatorImpl of NM, we delete the log files of the running LRS. We need to add a test case to verify AggregatedLogDeletionService won't delete the running LRS logs. 2. We apply the same retention policy at both sides, using the time to determine what log files need to be deleted. 3. For scalability consideration, let's keep the criteria of the number of logs per app, in case the rolling interval is small and too many configuration files are generated. But let's keep the config private to AppLogAggregatorImpl. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155649#comment-14155649 ] Hadoop QA commented on YARN-1414: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632578/YARN-1221-v2.patch against trunk revision dd1b8f2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5211//console This message is automatically generated. > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2637) maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation
Wangda Tan created YARN-2637: Summary: maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Priority: Critical Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (Iterator i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() >= getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() < getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info("Application " + application.getApplicationId() + " from user: " + application.getUser() + " activated in queue: " + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM (> minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2637: - Summary: maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation (was: maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation) > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Priority: Critical > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155664#comment-14155664 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672406/YARN-913-016.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 36 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1266 javac compiler warnings (more than the trunk's current 1265 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5208//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5208//console This message is automatically generated. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155669#comment-14155669 ] Hadoop QA commented on YARN-2254: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672416/YARN-2254.004.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5209//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5209//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5209//console This message is automatically generated. > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)