[jira] [Updated] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-3601: -- Attachment: YARN-3601.001.patch This patch fixes UT TestRMFailover.testRMWebAppRedirect, it covers all the tests against RM web app redirections. Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh
[ https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3654: Attachment: YARN-3654.2.patch ContainerLogsPage web UI should not have meta-refresh - Key: YARN-3654 URL: https://issues.apache.org/jira/browse/YARN-3654 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3654.1.patch, YARN-3654.2.patch Currently, When we try to find the container logs for the finished application, it will re-direct to the url which we re-configured for yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using meta-refresh: {code} set(TITLE, join(Redirecting to log server for , $(CONTAINER_ID))); html.meta_http(refresh, 1; url= + redirectUrl); {code} which is not good for some browsers which need to enable the meta-refresh in their security setting, especially for IE which meta-refresh is considered a security hole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job
[ https://issues.apache.org/jira/browse/YARN-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anushri updated YARN-3560: -- Assignee: Anushri Not able to navigate to the cluster from tracking url (proxy) generated after submission of job --- Key: YARN-3560 URL: https://issues.apache.org/jira/browse/YARN-3560 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Anushri Assignee: Anushri Priority: Minor Attachments: YARN-3560.patch a standalone web proxy server is enabled in the cluster when a job is submitted the url generated contains proxy track this url in the web page , if we try to navigate to the cluster links [about. applications, or scheduler] it gets redirected to some default port instead of actual RM web port configured as such it throws webpage not available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549949#comment-14549949 ] Arun Suresh commented on YARN-3633: --- Thanks for the patch [~ragarwal], Assuming we allow, as per the patch, the first AM to be scheduled, then, as per the example you specified in the description, the AM will take up 3GB in an 5GB queue... presuming each worker task requires more resources that the AM (I am guessing this should be true for most cases), then no other task can be scheduled on that queue. and remaining queues are anyway log-jammed since the maxAMshare logic would kick in. Wondering if its a valid scenario.. With Fair Scheduler, cluster can logjam when there are too many queues -- Key: YARN-3633 URL: https://issues.apache.org/jira/browse/YARN-3633 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Critical Attachments: YARN-3633.patch It's possible to logjam a cluster by submitting many applications at once in different queues. For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh
[ https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549967#comment-14549967 ] Hadoop QA commented on YARN-3654: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 36s | The applied patch generated 2 new checkstyle issues (total was 12, now 13). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 6s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 42m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733729/YARN-3654.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0790275 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7991/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7991/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7991/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7991/console | This message was automatically generated. ContainerLogsPage web UI should not have meta-refresh - Key: YARN-3654 URL: https://issues.apache.org/jira/browse/YARN-3654 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3654.1.patch, YARN-3654.2.patch Currently, When we try to find the container logs for the finished application, it will re-direct to the url which we re-configured for yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using meta-refresh: {code} set(TITLE, join(Redirecting to log server for , $(CONTAINER_ID))); html.meta_http(refresh, 1; url= + redirectUrl); {code} which is not good for some browsers which need to enable the meta-refresh in their security setting, especially for IE which meta-refresh is considered a security hole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549983#comment-14549983 ] Hadoop QA commented on YARN-3601: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 50s | Tests passed in hadoop-yarn-client. | | | | 23m 9s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733727/YARN-3601.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 93972a3 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7992/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7992/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7992/console | This message was automatically generated. Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xia Hu updated YARN-3126: - Attachment: resourcelimit-test.patch Add a unit test for this patch. FairScheduler: queue's usedResource is always more than the maxResource limit - Key: YARN-3126 URL: https://issues.apache.org/jira/browse/YARN-3126 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. Reporter: Xia Hu Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources Fix For: trunk-win Attachments: resourcelimit-02.patch, resourcelimit-test.patch, resourcelimit.patch When submitting spark application(both spark-on-yarn-cluster and spark-on-yarn-cleint model), the queue's usedResources assigned by fairscheduler always can be more than the queue's maxResources limit. And by reading codes of fairscheduler, I suppose this issue happened because of ignore to check the request resources when assign Container. Here is the detail: 1. choose a queue. In this process, it will check if queue's usedResource is bigger than its max, with assignContainerPreCheck. 2. then choose a app in the certain queue. 3. then choose a container. And here is the question, there is no check whether this container would make the queue sources over its max limit. If a queue's usedResource is 13G, the maxResource limit is 16G, then a container which asking for 4G resources may be assigned successful. This problem will always happen in spark application, cause we can ask for different container resources in different applications. By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549848#comment-14549848 ] Zhijie Shen commented on YARN-3411: --- [~vrushalic], thanks for working on the patch. Some comment from my side: 1. I saw in HBase implementation flow version is not included as part of row key. This is a bit different from primary key design of Phoenix implementation. Would you mind elaborating your rationale a bit? 2. Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop convention? We can keep them in this class now. Once we decide to move on with HBase impl, we should move (some of) them into YarnConfiguration as API. 3. I saw the classes are marked \@Public, but they're the backend classes and not accessible by the user directly. In fact, you can leave these classes not annotated. 4. According to TimelineSchemaCreator, we need to run command line to create the table when we setup the backend, right? Can we include creating the table into the lifecycle of HBaseTimelineWriterImpl? [Storage implementation] explore the native HBase write schema for storage -- Key: YARN-3411 URL: https://issues.apache.org/jira/browse/YARN-3411 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Priority: Critical Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, YARN-3411.poc.txt There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3674) YARN application disappears from view
[ https://issues.apache.org/jira/browse/YARN-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549858#comment-14549858 ] Siddharth Seth commented on YARN-3674: -- Clicking on a specific queue on the scheduler page, followed by a click on the 'Applications' / 'RUNNING' / etc links - ends up on a page which show no information that a queue has been selected. Ends up looking like the cluster isn't RUNNING anything or hasn't run anything if the queue isn't used. For [~sershe] - this was worse. Going back and selecting the default queue made no difference to the apps listing. YARN application disappears from view - Key: YARN-3674 URL: https://issues.apache.org/jira/browse/YARN-3674 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Sergey Shelukhin I have 2 tabs open at exact same URL with RUNNING applications view. There is an application that is, in fact, running, that is visible in one tab but not the other. This persists across refreshes. If I open new tab from the tab where the application is not visible, in that tab it shows up ok. I didn't change scheduler/queue settings before this behavior happened; on [~sseth]'s advice I went and tried to click the root node of the scheduler on scheduler page; the app still does not become visible. Something got stuck somewhere... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3654) ContainerLogsPage web UI should not have meta-refresh
[ https://issues.apache.org/jira/browse/YARN-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549875#comment-14549875 ] Xuan Gong commented on YARN-3654: - fix -1 on findbugs ContainerLogsPage web UI should not have meta-refresh - Key: YARN-3654 URL: https://issues.apache.org/jira/browse/YARN-3654 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3654.1.patch, YARN-3654.2.patch Currently, When we try to find the container logs for the finished application, it will re-direct to the url which we re-configured for yarn.log.server.url in yarn-site.xml. But in ContainerLogsPage, we are using meta-refresh: {code} set(TITLE, join(Redirecting to log server for , $(CONTAINER_ID))); html.meta_http(refresh, 1; url= + redirectUrl); {code} which is not good for some browsers which need to enable the meta-refresh in their security setting, especially for IE which meta-refresh is considered a security hole. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job
[ https://issues.apache.org/jira/browse/YARN-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anushri updated YARN-3560: -- Assignee: (was: Anushri) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job --- Key: YARN-3560 URL: https://issues.apache.org/jira/browse/YARN-3560 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Anushri Priority: Minor Attachments: YARN-3560.patch a standalone web proxy server is enabled in the cluster when a job is submitted the url generated contains proxy track this url in the web page , if we try to navigate to the cluster links [about. applications, or scheduler] it gets redirected to some default port instead of actual RM web port configured as such it throws webpage not available -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated YARN-3646: --- Attachment: YARN-3646.patch Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
Akira AJISAKA created YARN-3677: --- Summary: Fix findbugs warnings in FileSystemRMStateStore.java Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Priority: Minor There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3674) YARN application disappears from view
[ https://issues.apache.org/jira/browse/YARN-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549928#comment-14549928 ] Rohith commented on YARN-3674: -- Is this dup of YARN-2238? YARN application disappears from view - Key: YARN-3674 URL: https://issues.apache.org/jira/browse/YARN-3674 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Sergey Shelukhin I have 2 tabs open at exact same URL with RUNNING applications view. There is an application that is, in fact, running, that is visible in one tab but not the other. This persists across refreshes. If I open new tab from the tab where the application is not visible, in that tab it shows up ok. I didn't change scheduler/queue settings before this behavior happened; on [~sseth]'s advice I went and tried to click the root node of the scheduler on scheduler page; the app still does not become visible. Something got stuck somewhere... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549943#comment-14549943 ] Weiwei Yang commented on YARN-3601: --- I set a false flag so that HttpURLConnection does NOT automatically follow the redirect, this fixes too many redirections problem. (In the past it doesn't have this problem because there is a refresh time of 3 seconds so the client is still able to retrieve the redirect url from the http header). I am now able to retrieve redirection url from header field Location, and null if there is no redirection. The overall logic is not changed, the test case is fixed now. Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3633) With Fair Scheduler, cluster can logjam when there are too many queues
[ https://issues.apache.org/jira/browse/YARN-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550008#comment-14550008 ] Rohit Agarwal commented on YARN-3633: - Other non-AM containers can be scheduled in the queue - unlike the maxAMShare limit, the fair share is not a hard limit. So, the FS will schedule non-AM containers in this queue when it cannot schedule AM containers in other queues. I gave a walkthrough in this comment: https://issues.apache.org/jira/browse/YARN-3633?focusedCommentId=14542895page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14542895 With Fair Scheduler, cluster can logjam when there are too many queues -- Key: YARN-3633 URL: https://issues.apache.org/jira/browse/YARN-3633 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Critical Attachments: YARN-3633.patch It's possible to logjam a cluster by submitting many applications at once in different queues. For example, let's say there is a cluster with 20GB of total memory. Let's say 4 users submit applications at the same time. The fair share of each queue is 5GB. Let's say that maxAMShare is 0.5. So, each queue has at most 2.5GB memory for AMs. If all the users requested AMs of size 3GB - the cluster logjams. Nothing gets scheduled even when 20GB of resources are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550007#comment-14550007 ] Xia Hu commented on YARN-3126: -- I have submitted a unit test just now, review it again, thx~ FairScheduler: queue's usedResource is always more than the maxResource limit - Key: YARN-3126 URL: https://issues.apache.org/jira/browse/YARN-3126 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. Reporter: Xia Hu Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources Fix For: trunk-win Attachments: resourcelimit-02.patch, resourcelimit-test.patch, resourcelimit.patch When submitting spark application(both spark-on-yarn-cluster and spark-on-yarn-cleint model), the queue's usedResources assigned by fairscheduler always can be more than the queue's maxResources limit. And by reading codes of fairscheduler, I suppose this issue happened because of ignore to check the request resources when assign Container. Here is the detail: 1. choose a queue. In this process, it will check if queue's usedResource is bigger than its max, with assignContainerPreCheck. 2. then choose a app in the certain queue. 3. then choose a container. And here is the question, there is no check whether this container would make the queue sources over its max limit. If a queue's usedResource is 13G, the maxResource limit is 16G, then a container which asking for 4G resources may be assigned successful. This problem will always happen in spark application, cause we can ask for different container resources in different applications. By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550015#comment-14550015 ] Akira AJISAKA commented on YARN-3677: - setIsHDFS method should be synchronized. {code} @VisibleForTesting void setIsHDFS(boolean isHDFS) { this.isHDFS = isHDFS; } {code} Looks like this issue is caused by commit 9a2a95 but there is no issue id in the commit message. Hi [~vinodkv], would you point the jira related to the commit? Fix findbugs warnings in FileSystemRMStateStore.java Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Priority: Minor Labels: newbie There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550092#comment-14550092 ] Hadoop QA commented on YARN-3646: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 22m 17s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 63m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 93972a3 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7994/console | This message was automatically generated. Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550119#comment-14550119 ] Hadoop QA commented on YARN-3126: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 44s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 16s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 60m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 77m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733746/resourcelimit-test.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 93972a3 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7993/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7993/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7993/console | This message was automatically generated. FairScheduler: queue's usedResource is always more than the maxResource limit - Key: YARN-3126 URL: https://issues.apache.org/jira/browse/YARN-3126 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. Reporter: Xia Hu Labels: BB2015-05-TBR, assignContainer, fairscheduler, resources Fix For: trunk-win Attachments: resourcelimit-02.patch, resourcelimit-test.patch, resourcelimit.patch When submitting spark application(both spark-on-yarn-cluster and spark-on-yarn-cleint model), the queue's usedResources assigned by fairscheduler always can be more than the queue's maxResources limit. And by reading codes of fairscheduler, I suppose this issue happened because of ignore to check the request resources when assign Container. Here is the detail: 1. choose a queue. In this process, it will check if queue's usedResource is bigger than its max, with assignContainerPreCheck. 2. then choose a app in the certain queue. 3. then choose a container. And here is the question, there is no check whether this container would make the queue sources over its max limit. If a queue's usedResource is 13G, the maxResource limit is 16G, then a container which asking for 4G resources may be assigned successful. This problem will always happen in spark application, cause we can ask for different container resources in different applications. By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore.java
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550051#comment-14550051 ] Tsuyoshi Ozawa commented on YARN-3677: -- [~ajisakaa] thank you for finding the issue. The commit message says that the contribution is done by [~asuresh]. I think we should revert the change if the JIRA has not been opened yet - we should discuss the point. IMHO, we shouldn't switch the behaviour based on whether HDFS is used or not without the special reason. Fix findbugs warnings in FileSystemRMStateStore.java Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Priority: Minor Labels: newbie There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550233#comment-14550233 ] Rohith commented on YARN-3646: -- bq. Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. make sense to me,will reveiw the patch, thanks Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2821: Attachment: YARN-2821.005.patch Uploaded 005.patch which adds the tests requested by [~jianhe]. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez3:45454 14/11/04 18:21:39 INFO impl.AMRMClientImpl: Received new token for : onprem-tez4:45454 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=3 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_03, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_04, containerNode=onprem-tez3:45454, containerNodeURI=onprem-tez3:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_05, containerNode=onprem-tez4:45454, containerNodeURI=onprem-tez4:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_04 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_05 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_03 14/11/04 18:21:39 INFO
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-5.patch I am attaching patch as per latest source code and also with the above comments fix. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2821) Distributed shell app master becomes unresponsive sometimes
[ https://issues.apache.org/jira/browse/YARN-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550216#comment-14550216 ] Hadoop QA commented on YARN-2821: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 56s | Tests passed in hadoop-yarn-applications-distributedshell. | | | | 42m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733765/YARN-2821.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7995/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7995/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7995/console | This message was automatically generated. Distributed shell app master becomes unresponsive sometimes --- Key: YARN-2821 URL: https://issues.apache.org/jira/browse/YARN-2821 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.5.1 Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: YARN-2821.002.patch, YARN-2821.003.patch, YARN-2821.004.patch, YARN-2821.005.patch, apache-yarn-2821.0.patch, apache-yarn-2821.1.patch We've noticed that once in a while the distributed shell app master becomes unresponsive and is eventually killed by the RM. snippet of the logs - {noformat} 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: appattempt_1415123350094_0017_01 received 0 previous attempts' running containers on AM registration. 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:37 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[memory:10, vCores:1]Priority[0] 14/11/04 18:21:38 INFO impl.AMRMClientImpl: Received new token for : onprem-tez2:45454 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1415123350094_0017_01_02, containerNode=onprem-tez2:45454, containerNodeURI=onprem-tez2:50060, containerResourceMemory1024, containerResourceVirtualCores1 14/11/04 18:21:38 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1415123350094_0017_01_02 14/11/04 18:21:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : onprem-tez2:45454 14/11/04 18:21:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550256#comment-14550256 ] Rohith commented on YARN-3646: -- Thanks for working on this issue.. The patch overall looks good to me. nit : Can the test moved to Yarn package since issue is in Yarn? Otherwise if there is any changed in the RMProxy, test will not run. Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550258#comment-14550258 ] Rohith commented on YARN-3646: -- And I verified in one node cluster by enabling and disabling retryforever policy. Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3543: - Attachment: 0004-YARN-3543.patch ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Labels: BB2015-05-TBR Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550268#comment-14550268 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Yarn-trunk #932 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/932/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550288#comment-14550288 ] Raju Bairishetti commented on YARN-3646: Thanks [~rohithsharma] for the review. Looks like it is mainly an issue with retry policy. Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550299#comment-14550299 ] Hudson commented on YARN-3541: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/201/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550314#comment-14550314 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 30s | The applied patch generated 18 new checkstyle issues (total was 15, now 33). | | {color:green}+1{color} | whitespace | 0m 15s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 45s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 57s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 49m 59s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 53s | Tests passed in hadoop-yarn-server-tests. | | | | 99m 18s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733771/YARN-41-5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eb4c9dd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/7996/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7996/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7996/console | This message was automatically generated. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: (was: YARN-41-5.patch) The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-5.patch The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551790#comment-14551790 ] Hadoop QA commented on YARN-3626: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 22s | The applied patch generated 1 new checkstyle issues (total was 240, now 238). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 26s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | mapreduce tests | 0m 45s | Tests passed in hadoop-mapreduce-client-common. | | {color:green}+1{color} | yarn tests | 0m 30s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 58m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734054/YARN-3626.14.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8015/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8015/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8015/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8015/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8015/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8015/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8015/console | This message was automatically generated. On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3688) Remove unimplemented option for `hadoop fs -ls` from document in branch-2.7
Akira AJISAKA created YARN-3688: --- Summary: Remove unimplemented option for `hadoop fs -ls` from document in branch-2.7 Key: YARN-3688 URL: https://issues.apache.org/jira/browse/YARN-3688 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Akira AJISAKA {{-t}}, {{-s}}, {{-R}}, and {{-u}} option for {{hadoop fs -ls}} are unimplemented in 2.7.0 but documented in http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/FileSystemShell.html#ls We should fix the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3651) Tracking url in ApplicationCLI wrong for running application
[ https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551793#comment-14551793 ] Bibin A Chundatt commented on YARN-3651: Hi [~devaraj.k] and [~jianhe] i agree with comments , but i feel still improvement can be done on the same . as per the security. After discussion with some of members *MR can define a separate config and have NM localize the key files for AM* So i am reopening this bug as improvement. We can further discuss on this. Tracking url in ApplicationCLI wrong for running application Key: YARN-3651 URL: https://issues.apache.org/jira/browse/YARN-3651 Project: Hadoop YARN Issue Type: Bug Components: applications, resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Priority: Minor Application URL in Application CLI wrong Steps to reproduce == 1. Start HA setup insecure mode 2.Configure HTTPS_ONLY 3.Submit application to cluster 4.Execute command ./yarn application -list 5.Observer tracking URL shown {code} 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History server at /IP:45034 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id --- Tracking-URL application_1431672734347_0003 *http://host-10-19-92-117:13013* {code} *Expected* https://IP:64323/proxy/application_1431672734347_0003 / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3651) Support SSL for AM webapp
[ https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3651: --- Summary: Support SSL for AM webapp (was: Tracking url in ApplicationCLI wrong for running application) Support SSL for AM webapp - Key: YARN-3651 URL: https://issues.apache.org/jira/browse/YARN-3651 Project: Hadoop YARN Issue Type: Improvement Components: applications, resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Priority: Minor Application URL in Application CLI wrong Steps to reproduce == 1. Start HA setup insecure mode 2.Configure HTTPS_ONLY 3.Submit application to cluster 4.Execute command ./yarn application -list 5.Observer tracking URL shown {code} 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History server at /IP:45034 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id --- Tracking-URL application_1431672734347_0003 *http://host-10-19-92-117:13013* {code} *Expected* https://IP:64323/proxy/application_1431672734347_0003 / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3651) Tracking url in ApplicationCLI wrong for running application
[ https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3651: --- Issue Type: Improvement (was: Bug) Tracking url in ApplicationCLI wrong for running application Key: YARN-3651 URL: https://issues.apache.org/jira/browse/YARN-3651 Project: Hadoop YARN Issue Type: Improvement Components: applications, resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Priority: Minor Application URL in Application CLI wrong Steps to reproduce == 1. Start HA setup insecure mode 2.Configure HTTPS_ONLY 3.Submit application to cluster 4.Execute command ./yarn application -list 5.Observer tracking URL shown {code} 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History server at /IP:45034 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id --- Tracking-URL application_1431672734347_0003 *http://host-10-19-92-117:13013* {code} *Expected* https://IP:64323/proxy/application_1431672734347_0003 / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3651) Support SSL for AM webapp
[ https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reopened YARN-3651: Support SSL for AM webapp - Key: YARN-3651 URL: https://issues.apache.org/jira/browse/YARN-3651 Project: Hadoop YARN Issue Type: Improvement Components: applications, resourcemanager Affects Versions: 2.7.0 Environment: Suse 11 Sp3 Reporter: Bibin A Chundatt Priority: Minor Application URL in Application CLI wrong Steps to reproduce == 1. Start HA setup insecure mode 2.Configure HTTPS_ONLY 3.Submit application to cluster 4.Execute command ./yarn application -list 5.Observer tracking URL shown {code} 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History server at /IP:45034 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id --- Tracking-URL application_1431672734347_0003 *http://host-10-19-92-117:13013* {code} *Expected* https://IP:64323/proxy/application_1431672734347_0003 / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti updated YARN-3646: --- Attachment: YARN-3646.001.patch Added a new unit test in hadoop-yarn-client. [~rohithsharma] Could you please review? Ran the test without starting the RM and then test was getting timeout. Ran the test by starting the RM then client is getting ApplicationNotFoundException for older/invalid appId. {code} rm = new ResourceManager(); rm.init(conf); rm.start(); {code} Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Attachments: YARN-3646.001.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations
[ https://issues.apache.org/jira/browse/YARN-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551482#comment-14551482 ] Vinod Kumar Vavilapalli commented on YARN-3685: --- My first reaction is that this jar can be generated by the app (say MR client or AM) and passed to YARN. In the worst case, this should get thrown out of ContainerExecutor and instead become a specific implementation only on the Windows-executor which inspects env variables and does this mangling. /cc [~cnauroth], [~rusanu] who worked in this area before. Thoughts? NodeManager unnecessarily knows about classpath-jars due to Windows limitations --- Key: YARN-3685 URL: https://issues.apache.org/jira/browse/YARN-3685 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Found this while looking at cleaning up ContainerExecutor via YARN-3648, making it a sub-task. YARN *should not* know about classpaths. Our original design modeled around this. But when we added windows suppport, due to classpath issues, we ended up breaking this abstraction via YARN-316. We should clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551826#comment-14551826 ] Rohith commented on YARN-2268: -- Thanks [~sunilg] [~jianhe] [~kasha] for sharing your thoughts.. bq. Given we recommend using the ZK-store when using HA, how about adding this for the ZK-store using an ephemeral znode for lock first? +1 given state store recommend for ZKRMStateStore. bq. How about creating a lock file and declaring it stale after a stipulated period of time. If we use stipulated period, am thinking that within the stiplated period, neither RM cant be started nor state store format cant be done. And the file has to be stored in hdfs neverthless of RMStateStore which is extra binding with filesytem. I am thinking , why can't we use general approach of polling the web service, it will give more accurate state. ? Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2923) Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551841#comment-14551841 ] Hadoop QA commented on YARN-2923: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 38s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 38s | The applied patch generated 1 new checkstyle issues (total was 214, now 215). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 21s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 8s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 48m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733391/YARN-2923.20150517-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8016/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8016/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8016/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8016/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8016/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8016/console | This message was automatically generated. Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup - Key: YARN-2923 URL: https://issues.apache.org/jira/browse/YARN-2923 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2923.20141204-1.patch, YARN-2923.20141210-1.patch, YARN-2923.20150328-1.patch, YARN-2923.20150404-1.patch, YARN-2923.20150517-1.patch As part of Distributed Node Labels configuration we need to support Node labels to be configured in Yarn-site.xml. And on modification of Node Labels configuration in yarn-site.xml, NM should be able to get modified Node labels from this NodeLabelsprovider service without NM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551854#comment-14551854 ] Varun Saxena commented on YARN-3678: [~vinodkv], as this issue happened in our customer deployment, I will explain the issue. We got an issue wherein NM was being randomly killed at one of the places where Hadoop distribution is deployed. In logs, we could see NM being killed immediately after {{signalContainer}} is called. What happens is as under : # LCE sends a SIGTERM to the container and waits for 250 ms # Probably within this 250 ms period, container processes the signal and exits gracefully. # Now it is possible the pid assigned to container is taken up by some other process or thread(which run as light weight processes in Linux). # When LCE again tries to send a SIGKILL to the same pid, it might actually be sending it to another process or thread. # As we could not find any other reason for NM going randomly down, we suspect it may have gone down because some new thread of NM took up this pid and SIGKILL may have been sent to it, which may have crashed NM. This is more based on suspicion though rather than fool proof analysis. Not sure how to verify if this indeed happened. Pls note that {{pid_max}} in the deployment was {{32768}}. I am not sure about which user was the process owner though. Probably [~gu chi] can shed some light on that. An additional check can be done IMHO. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551506#comment-14551506 ] Karthik Kambatla commented on YARN-3655: If allocating a container is going to take the amShare over the maxAMShare, not allocating and hence unreserving resources seems reasonable. That said, we should also add the same check before making such a reservation in FSAppAttempt#assignContainer. There is already a check to ensure we won't go over maxShare. In terms of code organization, I would like for us to create a helper method (okayToReserveResources) that would check the maxShare for all containers and maxAMShare for AM containers. Also, looking at the code, I see fitsInMaxShare method is a static in FairScheduler. We should just make it a non-static method in FSQueue, it can call parent.fitsInMaxShare. Can we file a follow-up JIRA for it? FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3645) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Liptak updated YARN-3645: --- Attachment: YARN-3645.patch ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml Key: YARN-3645 URL: https://issues.apache.org/jira/browse/YARN-3645 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin Attachments: YARN-3645.patch The aclSubmitApps is configured in fair-scheduler.xml like below: queue name=mr aclSubmitApps/aclSubmitApps /queue The resourcemanager log: 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) Caused by: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) ... 9 more 2015-05-14 12:59:48,623 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2015-05-14 12:59:48,623 INFO com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin transitionToStandbyIn 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When stopping the service ResourceManager : java.lang.NullPointerException java.lang.NullPointerException at com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) 2015-05-14 12:59:48,623 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551666#comment-14551666 ] Vinod Kumar Vavilapalli commented on YARN-3678: --- The default delay is 250 milliseconds. So it is very hard to hit this condition. At least when LinuxContainerExecutor is used, the kill is done as the user itself, so it's unlikely it will affect other users' processes. Other than also doing a user-check to ensure its the same user's container, I am not sure what else can be done. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551742#comment-14551742 ] Hadoop QA commented on YARN-3645: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 50s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 47s | The applied patch generated 7 new checkstyle issues (total was 27, now 33). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 11s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 86m 47s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734036/YARN-3645.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ce53c8e | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8014/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8014/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8014/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8014/console | This message was automatically generated. ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml Key: YARN-3645 URL: https://issues.apache.org/jira/browse/YARN-3645 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin Attachments: YARN-3645.patch The aclSubmitApps is configured in fair-scheduler.xml like below: queue name=mr aclSubmitApps/aclSubmitApps /queue The resourcemanager log: 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) Caused by: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) at
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551759#comment-14551759 ] Ming Ma commented on YARN-221: -- Thanks [~xgong]. You raise some valid points about abstraction. Here are my takes on this. It appears the main requirements are: * There needs to be a cluster-wide default log aggregation policy at YARN layer. That should be extensible. To change it and add a new policy, it is ok to require NM restart given NM needs to load the policy object. * Any YARN application can override the default YARN policy with its own the log aggregation policy. This application specific policy can come from the list of available policies provided at YARN layer. There is no need to provide the ability for the application to submit a new policy implementation on the fly. Given these: * Abstraction via interface seem like a good idea. ContainerLogAggregationPolicy interface can include the following method to address all the policies that we know of so far. However, it seems we might end up with many policies given the possible permutation, e.g., AMContainerLogAndFailWorkerContainerOnlyLogAggregationPolicy, AMContainerLogAndFailOrKilledWorkerContainerOnlyLogAggregationPolicy, etc. {noformat} public interface ContainerLogAggregationPolicy { public boolean shouldDoLogAggregation(ContainerId containerId, int exitCode); } {noformat} * The cluster-wide default policy at YARN layer is configurable. {noformat} property nameyarn.nodemanager.container-log-aggregation-policy.class/name valueorg.apache.hadoop.yarn.server.nodemanager.container-log-aggregation-policy.AllContainerLogAggregationPolicy/value /property {noformat} * All the known policies will be part of YARN including SampleRateContainerLogAggregationPolicy. So we still need to config sample rate for that policy. If we don't put it in YarnConfiguration, where can we put it? It seems we already have a bunch of configuration properties in YarnConfiguration that are specific the plugin implementation such as container executor properties. * Should ContainerLogAggregationPolicy be part of ContainerLaunchContext or LogAggregationContext. It seems LogAggregationContext is a better fit. That also means ContainerLogAggregationPolicy will be specified as part of ApplicationSubmissionContext. For application to specify a log policy, the policy class needs to be loadable by NM. So the LogAggregationContext will have new methods like: {noformat} public abstract class LogAggregationContext { public void setContainerLogPolicyClass(Class? extends ContainerLogAggregationPolicy logPolicy); public Class? extends ContainerLogAggregationPolicy getContainerLogPolicyClass(); } {noformat} * How MR overrides the default policy. Maybe we can have YarnRunner at MR level honor yarn property yarn.container-log-aggregation-policy.class on per job level when it creates the ApplicationSubmissionContext with the proper LogAggregationContext. In that way we don't have to create extra log aggregation properties specific at MR layer. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Attachments: YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3685) NodeManager unnecessarily knows about classpath-jars due to Windows limitations
Vinod Kumar Vavilapalli created YARN-3685: - Summary: NodeManager unnecessarily knows about classpath-jars due to Windows limitations Key: YARN-3685 URL: https://issues.apache.org/jira/browse/YARN-3685 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Found this while looking at cleaning up ContainerExecutor via YARN-3648, making it a sub-task. YARN *should not* know about classpaths. Our original design modeled around this. But when we added windows suppport, due to classpath issues, we ended up breaking this abstraction via YARN-316. We should clean this up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551706#comment-14551706 ] Hadoop QA commented on YARN-2336: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 47s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 0s | Site still builds. | | {color:green}+1{color} | checkstyle | 0m 47s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 49m 59s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 93m 19s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734018/YARN-2336.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / 7401e5b | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8013/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8013/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8013/console | This message was automatically generated. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551477#comment-14551477 ] Hudson commented on YARN-3565: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7870 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7870/]) YARN-3565. NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String. (Naganarasimha G R via wangda) (wangda: rev b37da52a1c4fb3da2bd21bfadc5ec61c5f953a59) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3677: - Summary: Fix findbugs warnings in yarn-server-resourcemanager (was: Fix findbugs warnings in FileSystemRMStateStore) Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3677) Fix findbugs warnings in FileSystemRMStateStore
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3677: - Summary: Fix findbugs warnings in FileSystemRMStateStore (was: Fix findbugs warnings in yarn-server-resourcemanager) Fix findbugs warnings in FileSystemRMStateStore --- Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551526#comment-14551526 ] Hudson commented on YARN-3677: -- FAILURE: Integrated in Hadoop-trunk-Commit #7872 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7872/]) YARN-3677. Fix findbugs warnings in yarn-server-resourcemanager. Contributed by Vinod Kumar Vavilapalli. (ozawa: rev 7401e5b5e8060b6b027d714b5ceb641fcfe5b598) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3687) We should be able to remove node-label if there's no queue can use it.
Wangda Tan created YARN-3687: Summary: We should be able to remove node-label if there's no queue can use it. Key: YARN-3687 URL: https://issues.apache.org/jira/browse/YARN-3687 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Currently, we cannot remove node label from the cluster if there's no queue configure it, but actually we should be able to remove it if capacity on the node label in root queue is 0. This can avoid painful when user wants to reconfigure node label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-3686: - Assignee: Sunil G CapacityScheduler should trim default_node_label_expression --- Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Priority: Critical We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551681#comment-14551681 ] gu-chi commented on YARN-3678: -- I see the possibility is low, but with heavy task load, it occurs frequently. I would suggest to add a check before kill, check if the process ID belongs to the container. DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3626) On Windows localized resources are not moved to the front of the classpath when they should be
[ https://issues.apache.org/jira/browse/YARN-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3626: -- Attachment: YARN-3626.14.patch On Windows localized resources are not moved to the front of the classpath when they should be -- Key: YARN-3626 URL: https://issues.apache.org/jira/browse/YARN-3626 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.7.1 Attachments: YARN-3626.0.patch, YARN-3626.11.patch, YARN-3626.14.patch, YARN-3626.4.patch, YARN-3626.6.patch, YARN-3626.9.patch In response to the mapreduce.job.user.classpath.first setting the classpath is ordered differently so that localized resources will appear before system classpath resources when tasks execute. On Windows this does not work because the localized resources are not linked into their final location when the classpath jar is created. To compensate for that localized jar resources are added directly to the classpath generated for the jar rather than being discovered from the localized directories. Unfortunately, they are always appended to the classpath, and so are never preferred over system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551749#comment-14551749 ] Craig Welch commented on YARN-3681: --- Tested my own version of this patch yesterday which does the same thing and works, so +1 LGTM yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551745#comment-14551745 ] Karthik Kambatla commented on YARN-2268: Given we recommend using the ZK-store when using HA, how about adding this for the ZK-store using an ephemeral znode for lock first? We could think of alternate ways for other stores. How about creating a lock file and declaring it stale after a stipulated period of time. It is a hacky approach, but might suffice? Disallow formatting the RMStateStore when there is an RM running Key: YARN-2268 URL: https://issues.apache.org/jira/browse/YARN-2268 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Rohith Attachments: 0001-YARN-2268.patch YARN-2131 adds a way to format the RMStateStore. However, it can be a problem if we format the store while an RM is actively using it. It would be nice to fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3684: Attachment: YARN-3684.002.patch Attaching a (correctly-named) patch with fixes to findbugs issues Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information. - Key: YARN-3684 URL: https://issues.apache.org/jira/browse/YARN-3684 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3648.001.patch, YARN-3684.002.patch As per description in parent JIRA : Adding additional arguments to key ContainerExecutor methods ( e.g startLocalizer or launchContainer ) would break the existing ContainerExecutor interface and would require changes to all executor implementations in YARN. In order to make this interface less brittle in the future, it would make sense to encapsulate arguments in some kind of a ‘context’ object which could be modified/extended without breaking the ContainerExecutor interface in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551513#comment-14551513 ] Hudson commented on YARN-3583: -- FAILURE: Integrated in Hadoop-trunk-Commit #7871 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7871/]) YARN-3583. Support of NodeLabel object instead of plain String in YarnClient side. (Sunil G via wangda) (wangda: rev 563eb1ad2ae848a23bbbf32ebfaf107e8fa14e87) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/ReplaceLabelsOnNodeRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetNodesToLabelsResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetNodesToLabelsResponse.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3667) Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
[ https://issues.apache.org/jira/browse/YARN-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551543#comment-14551543 ] Tsuyoshi Ozawa commented on YARN-3667: -- [~zxu] [~leftnoteasy] thank you for taking this issue. I've missed this issue - YARN-3677 has been committed a few minutes ago. Can we close this as duplicated? Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS - Key: YARN-3667 URL: https://issues.apache.org/jira/browse/YARN-3667 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.8.0 Attachments: YARN-3667.000.patch Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS This findbugs warning is reported at https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3601) Fix UT TestRMFailover.testRMWebAppRedirect
[ https://issues.apache.org/jira/browse/YARN-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551651#comment-14551651 ] Weiwei Yang commented on YARN-3601: --- Thank you [~xgong] Fix UT TestRMFailover.testRMWebAppRedirect -- Key: YARN-3601 URL: https://issues.apache.org/jira/browse/YARN-3601 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Environment: Red Hat Enterprise Linux Workstation release 6.5 (Santiago) Reporter: Weiwei Yang Assignee: Weiwei Yang Priority: Critical Labels: test Fix For: 2.7.1 Attachments: YARN-3601.001.patch This test case was not working since the commit from YARN-2605. It failed with NPE exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3681) yarn cmd says could not find main class 'queue' in windows
[ https://issues.apache.org/jira/browse/YARN-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3681: -- Priority: Blocker (was: Critical) Target Version/s: 2.7.1 (was: 2.8.0) Sounds like a 2.7.1 blocker to me.. yarn cmd says could not find main class 'queue' in windows Key: YARN-3681 URL: https://issues.apache.org/jira/browse/YARN-3681 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Environment: Windows Only Reporter: Sumana Sathish Assignee: Varun Saxena Priority: Blocker Labels: windows, yarn-client Attachments: YARN-3681.01.patch, yarncmd.png Attached the screenshot of the command prompt in windows running yarn queue command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551769#comment-14551769 ] Naganarasimha G R commented on YARN-2729: - Hi [~wangda] [~vinodkv], Any further thoughts on the above comments ? Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch, YARN-2729.20150517-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3667) Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
[ https://issues.apache.org/jira/browse/YARN-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551426#comment-14551426 ] Wangda Tan commented on YARN-3667: -- +1, will commit it later. Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS - Key: YARN-3667 URL: https://issues.apache.org/jira/browse/YARN-3667 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.8.0 Attachments: YARN-3667.000.patch Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS This findbugs warning is reported at https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3667) Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
[ https://issues.apache.org/jira/browse/YARN-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3667: - Fix Version/s: 2.8.0 Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS - Key: YARN-3667 URL: https://issues.apache.org/jira/browse/YARN-3667 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.8.0 Attachments: YARN-3667.000.patch Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS This findbugs warning is reported at https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3609) Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
[ https://issues.apache.org/jira/browse/YARN-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551428#comment-14551428 ] Wangda Tan commented on YARN-3609: -- Findbugs warning is tracked by: https://issues.apache.org/jira/browse/YARN-3667 Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case. Key: YARN-3609 URL: https://issues.apache.org/jira/browse/YARN-3609 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3609.1.preliminary.patch, YARN-3609.2.patch, YARN-3609.3.patch Now RMNodeLabelsManager loads label when serviceInit, but RMActiveService.start() is called when RM HA transition happens. We haven't done this before because queue's initialization happens in serviceInit as well, we need make sure labels added to system before init queue, after YARN-2918, we should be able to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551467#comment-14551467 ] Hadoop QA commented on YARN-3677: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 50m 3s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 85m 55s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733965/YARN-3677-20150519.txt | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7438966 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8011/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8011/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8011/console | This message was automatically generated. Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551494#comment-14551494 ] Tsuyoshi Ozawa commented on YARN-3677: -- +1, committing this shortly. Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551566#comment-14551566 ] Sidharta Seethana commented on YARN-3684: - [~vinodkv] , could you please review this patch? Thanks. Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information. - Key: YARN-3684 URL: https://issues.apache.org/jira/browse/YARN-3684 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3648.001.patch, YARN-3684.002.patch As per description in parent JIRA : Adding additional arguments to key ContainerExecutor methods ( e.g startLocalizer or launchContainer ) would break the existing ContainerExecutor interface and would require changes to all executor implementations in YARN. In order to make this interface less brittle in the future, it would make sense to encapsulate arguments in some kind of a ‘context’ object which could be modified/extended without breaking the ContainerExecutor interface in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3686) CapacityScheduler should trim default_node_label_expression
Wangda Tan created YARN-3686: Summary: CapacityScheduler should trim default_node_label_expression Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Priority: Critical We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-2336: - Attachment: YARN-2336.009.patch Submitting Akira's patch again sicne YARN-3677 is fixed now. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1, 2.6.0 Reporter: Kenji Kikushima Assignee: Akira AJISAKA Labels: BB2015-05-RFC Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.005.patch, YARN-2336.007.patch, YARN-2336.008.patch, YARN-2336.009.patch, YARN-2336.009.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3645) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551645#comment-14551645 ] Gabor Liptak commented on YARN-3645: I attached a patch (having trouble running unit tests locally ...) ResourceManager can't start success if attribute value of aclSubmitApps is null in fair-scheduler.xml Key: YARN-3645 URL: https://issues.apache.org/jira/browse/YARN-3645 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.2 Reporter: zhoulinlin Attachments: YARN-3645.patch The aclSubmitApps is configured in fair-scheduler.xml like below: queue name=mr aclSubmitApps/aclSubmitApps /queue The resourcemanager log: 2015-05-14 12:59:48,623 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:493) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:920) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:240) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) Caused by: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1301) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1318) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:458) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1299) ... 9 more 2015-05-14 12:59:48,623 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2015-05-14 12:59:48,623 INFO com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory: plugin transitionToStandbyIn 2015-05-14 12:59:48,623 WARN org.apache.hadoop.service.AbstractService: When stopping the service ResourceManager : java.lang.NullPointerException java.lang.NullPointerException at com.zte.zdh.platformplugin.factory.YarnPlatformPluginProxyFactory.transitionToStandbyIn(YarnPlatformPluginProxyFactory.java:71) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:997) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1058) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1159) 2015-05-14 12:59:48,623 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: java.io.IOException: Failed to initialize FairScheduler at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551756#comment-14551756 ] gu-chi commented on YARN-3678: -- The PID number may be not use as a process, also can be a thread, linux treat process and thread the same, kill one thread in process may also kill the process too, for thread, 250ms is possible to start, rt? DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String
[ https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551755#comment-14551755 ] Naganarasimha G R commented on YARN-3565: - Thanks for Reviewing and Committing [~wangda] [~vinodkv] NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String - Key: YARN-3565 URL: https://issues.apache.org/jira/browse/YARN-3565 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, YARN-3565.20150516-1.patch, YARN-3565.20150519-1.patch Now NM HB/Register uses SetString, it will be hard to add new fields if we want to support specifying NodeLabel type such as exclusivity/constraints, etc. We need to make sure rolling upgrade works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551557#comment-14551557 ] Hadoop QA commented on YARN-3684: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 22s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 49s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 5s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 9s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 42m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733995/YARN-3684.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b37da52 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8012/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8012/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8012/console | This message was automatically generated. Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information. - Key: YARN-3684 URL: https://issues.apache.org/jira/browse/YARN-3684 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3648.001.patch, YARN-3684.002.patch As per description in parent JIRA : Adding additional arguments to key ContainerExecutor methods ( e.g startLocalizer or launchContainer ) would break the existing ContainerExecutor interface and would require changes to all executor implementations in YARN. In order to make this interface less brittle in the future, it would make sense to encapsulate arguments in some kind of a ‘context’ object which could be modified/extended without breaking the ContainerExecutor interface in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3667) Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS
[ https://issues.apache.org/jira/browse/YARN-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551673#comment-14551673 ] zhihai xu commented on YARN-3667: - [~ozawa], yes, please go ahead and close it as duplicated. thanks Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS - Key: YARN-3667 URL: https://issues.apache.org/jira/browse/YARN-3667 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.8.0 Attachments: YARN-3667.000.patch Fix findbugs warning Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS This findbugs warning is reported at https://builds.apache.org/job/PreCommit-YARN-Build/7956/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3583) Support of NodeLabel object instead of plain String in YarnClient side.
[ https://issues.apache.org/jira/browse/YARN-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551687#comment-14551687 ] Sunil G commented on YARN-3583: --- Thank you very much [~leftnoteasy] for reviewing and committing the same! Support of NodeLabel object instead of plain String in YarnClient side. --- Key: YARN-3583 URL: https://issues.apache.org/jira/browse/YARN-3583 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Sunil G Assignee: Sunil G Fix For: 2.8.0 Attachments: 0001-YARN-3583.patch, 0002-YARN-3583.patch, 0003-YARN-3583.patch, 0004-YARN-3583.patch Similar to YARN-3521, use NodeLabel objects in YarnClient side apis. getLabelsToNodes/getNodeToLabels api's can use NodeLabel object instead of using plain label name. This will help to bring other label details such as Exclusivity to client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3677) Fix findbugs warnings in yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551465#comment-14551465 ] Tsuyoshi Ozawa commented on YARN-3677: -- Talked with Vinod and Arun offline. I understood that it's necessary change. Fix findbugs warnings in yarn-server-resourcemanager Key: YARN-3677 URL: https://issues.apache.org/jira/browse/YARN-3677 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Akira AJISAKA Assignee: Vinod Kumar Vavilapalli Priority: Minor Labels: newbie Attachments: YARN-3677-20150519.txt There is 1 findbugs warning in FileSystemRMStateStore.java. {noformat} Inconsistent synchronization of FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java: [line 156] Field org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS Synchronized 66% of the time Synchronized access at FileSystemRMStateStore.java: [line 148] Synchronized access at FileSystemRMStateStore.java: [line 859] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3686) CapacityScheduler should trim default_node_label_expression
[ https://issues.apache.org/jira/browse/YARN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3686: -- Attachment: 0001-YARN-3686.patch Hi [~leftnoteasy], I would like to take up this problem if its fine. IMO from CapacitySchedulerConfiguration#getDefaultNodeLabelExpression, we can trim the label to avoid this problem. Sharing a patch. Please share your opinion. CapacityScheduler should trim default_node_label_expression --- Key: YARN-3686 URL: https://issues.apache.org/jira/browse/YARN-3686 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-3686.patch We should trim default_node_label_expression for queue before using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container
[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550390#comment-14550390 ] gu-chi commented on YARN-3678: -- I think if decrease the max_pid setting in OS can enlarge the possibility of reproducing, working on DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned YARN-3679: --- Assignee: Mit Desai Add documentation for timeline server filter ordering - Key: YARN-3679 URL: https://issues.apache.org/jira/browse/YARN-3679 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Assignee: Mit Desai Currently the auth filter is before static user filter by default. After YARN-3624, the filter order is no longer reversed. So the pseudo auth's allowing anonymous config is useless with both filters loaded in the new order, because static user will be created before presenting it to auth filter. The user can remove static user filter from the config to get anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550380#comment-14550380 ] Devaraj K edited comment on YARN-41 at 5/19/15 12:53 PM: - Updated the patch with checkstyle fixes. was (Author: devaraj.k): Updated the patch checkstyle fixes. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550500#comment-14550500 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2130 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2130/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3630) YARN should suggest a heartbeat interval for applications
[ https://issues.apache.org/jira/browse/YARN-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-3630: -- Attachment: YARN-3630.001.patch.patch Initial patch with adaptive heartbeat policy unimplemented. If we determine to implement a good enough adaptive heartbeat policy, this jira would depend YARN-3652, where we have enough information of the scheduler's load to determine the heartbeat interval. YARN should suggest a heartbeat interval for applications - Key: YARN-3630 URL: https://issues.apache.org/jira/browse/YARN-3630 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Affects Versions: 2.7.0 Reporter: Zoltán Zvara Assignee: Xianyin Xin Priority: Minor Attachments: YARN-3630.001.patch.patch It seems currently applications - for example Spark - are not adaptive to RM regarding heartbeat intervals. RM should be able to suggest a desired heartbeat interval to applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lavkesh Lahngir updated YARN-3591: -- Target Version/s: 2.8.0 (was: 2.7.1) Affects Version/s: (was: 2.6.0) 2.7.0 Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3605) _ as method name may not be supported much longer
[ https://issues.apache.org/jira/browse/YARN-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-3605: --- Assignee: Devaraj K _ as method name may not be supported much longer - Key: YARN-3605 URL: https://issues.apache.org/jira/browse/YARN-3605 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Devaraj K I was trying to run the precommit test on my mac under JDK8, and I got the following error related to javadocs. (use of '_' as an identifier might not be supported in releases after Java SE 8) It looks like we need to at least change the method name to not be '_' any more, or possibly replace the HTML generation with something more standard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3678) DelayedProcessKiller may kill other process other than container
gu-chi created YARN-3678: Summary: DelayedProcessKiller may kill other process other than container Key: YARN-3678 URL: https://issues.apache.org/jira/browse/YARN-3678 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: gu-chi Priority: Critical Suppose one container finished, then it will do clean up, the PID file still exist and will trigger once singalContainer, this will kill the process with the pid in PID file, but as container already finished, so this PID may be occupied by other process, this may cause serious issue. As I know, my NM was killed unexpectedly, what I described can be the cause. Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550479#comment-14550479 ] Mit Desai commented on YARN-3624: - Correction: YARN-3679 ApplicationHistoryServer reverses the order of the filters it gets -- Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3624.patch AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-41: -- Attachment: YARN-41-6.patch Updated the patch checkstyle fixes. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3679) Add documentation for timeline server filter ordering
Mit Desai created YARN-3679: --- Summary: Add documentation for timeline server filter ordering Key: YARN-3679 URL: https://issues.apache.org/jira/browse/YARN-3679 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Currently the auth filter is before static user filter by default. After YARN-3624, the filter order is no longer reversed. So the pseudo auth's allowing anonymous config is useless with both filters loaded in the new order, because static user will be created before presenting it to auth filter. The user can remove static user filter from the config to get anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3624) ApplicationHistoryServer reverses the order of the filters it gets
[ https://issues.apache.org/jira/browse/YARN-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550447#comment-14550447 ] Mit Desai commented on YARN-3624: - Filed YARN-2679 ApplicationHistoryServer reverses the order of the filters it gets -- Key: YARN-3624 URL: https://issues.apache.org/jira/browse/YARN-3624 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3624.patch AppliactionHistoryServer should not alter the order in which it gets the filter chain. Additional filters should be added at the end of the chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550529#comment-14550529 ] Hudson commented on YARN-3541: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #190 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/190/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3679) Add documentation for timeline server filter ordering
[ https://issues.apache.org/jira/browse/YARN-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550614#comment-14550614 ] Hadoop QA commented on YARN-3679: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 2m 54s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 55s | Site still builds. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 6m 15s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733835/YARN-3679.patch | | Optional Tests | site | | git revision | trunk / de30d66 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8000/console | This message was automatically generated. Add documentation for timeline server filter ordering - Key: YARN-3679 URL: https://issues.apache.org/jira/browse/YARN-3679 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-3679.patch Currently the auth filter is before static user filter by default. After YARN-3624, the filter order is no longer reversed. So the pseudo auth's allowing anonymous config is useless with both filters loaded in the new order, because static user will be created before presenting it to auth filter. The user can remove static user filter from the config to get anonymous user work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550558#comment-14550558 ] Hadoop QA commented on YARN-41: --- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 49s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 5m 59s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 50m 15s | Tests passed in hadoop-yarn-server-resourcemanager. | | {color:green}+1{color} | yarn tests | 1m 51s | Tests passed in hadoop-yarn-server-tests. | | | | 99m 0s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | | Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS; locked 66% of time Unsynchronized access at FileSystemRMStateStore.java:66% of time Unsynchronized access at FileSystemRMStateStore.java:[line 156] | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733802/YARN-41-6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / de30d66 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/7998/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7998/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7998/console | This message was automatically generated. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3541) Add version info on timeline service / generic history web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550589#comment-14550589 ] Hudson commented on YARN-3541: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #200 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/200/]) YARN-3541. Add version info on timeline service / generic history web UI and REST API. Contributed by Zhijie Shen (xgong: rev 76afd28862c1f27011273659a82cd45903a77170) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AboutBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineAbout.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/timeline/TimelineUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java Add version info on timeline service / generic history web UI and REST API -- Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.8.0 Attachments: YARN-3541.1.patch, YARN-3541.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)