[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322923#comment-14322923 ] Devaraj K commented on YARN-1299: - Thanks [~ozawa] for review. Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323017#comment-14323017 ] Hudson commented on YARN-1299: -- FAILURE: Integrated in Hadoop-trunk-Commit #7120 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7120/]) Revert YARN-1299. Improve a log message in AppSchedulingInfo by adding application id. Contributed by Ashutosh Jindal and devaraj. (ozawa: rev 3f32357c368f4efac33835d719641c961f93a0be) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java YARN-1299. Improve a log message in AppSchedulingInfo by adding application id. Contributed by Ashutosh Jindal and Devaraj K. (ozawa: rev 556386a07084b70a5d2ae0c2bd4445a348306db8) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java Improve a log message in AppSchedulingInfo by adding application id --- Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Fix For: 2.7.0 Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323050#comment-14323050 ] zhihai xu commented on YARN-1778: - [~ozawa], That is a good idea. Although we can increase dfs.client.block.write.locateFollowingBlock.retries in configuration file and the FileSystemRMStateStore will take the change in startInternal from the configuration file in the following code, it will affect all the other modules to take this change. That may not be feasible. {code} Configuration conf = new Configuration(getConfig()); fs = fsWorkingPath.getFileSystem(conf); {code} To increase the flexibility, we can create a new configuration to customize dfs.client.block.write.locateFollowingBlock.retries for FileSystemRMStateStore, which is similar as FS_RM_STATE_STORE_RETRY_POLICY_SPEC to customize dfs.client.retry.policy.spec for FileSystemRMStateStore at the following code from startInternal: {code} String retryPolicy = conf.get(YarnConfiguration.FS_RM_STATE_STORE_RETRY_POLICY_SPEC, YarnConfiguration.DEFAULT_FS_RM_STATE_STORE_RETRY_POLICY_SPEC); conf.set(dfs.client.retry.policy.spec, retryPolicy); {code} I will implement a new patch based on this. thanks for the suggestion. zhihai TestFSRMStateStore fails on trunk - Key: YARN-1778 URL: https://issues.apache.org/jira/browse/YARN-1778 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: zhihai xu Attachments: YARN-1778.000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322973#comment-14322973 ] Tsuyoshi OZAWA commented on YARN-1299: -- Findbugs is not related to the patch. We don't need tests since this is an improvement for log message. Committing this shortly. Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Fix For: 2.7.0 Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322970#comment-14322970 ] Hadoop QA commented on YARN-1299: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699109/YARN-1299.patch against trunk revision 447bd7b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6644//console This message is automatically generated. Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
Brahma Reddy Battula created YARN-3204: -- Summary: Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323051#comment-14323051 ] Varun Saxena commented on YARN-3204: Linking it to YARN-3181 Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) - Key: YARN-3204 URL: https://issues.apache.org/jira/browse/YARN-3204 Project: Hadoop YARN Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Please check following findbug report.. https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1299: - Summary: Improve a log message in AppSchedulingInfo by adding application id (was: Improve 'checking for deactivate...' log message by adding app id) Improve a log message in AppSchedulingInfo by adding application id --- Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Fix For: 2.7.0 Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322997#comment-14322997 ] Hudson commented on YARN-1299: -- FAILURE: Integrated in Hadoop-trunk-Commit #7119 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7119/]) YARN-1299. Improve a log message in AppSchedulingInfo by adding application id. Contributed by Ashutosh Jindal and devaraj. (ozawa: rev 9aae81c93421874b726c7b6ff970895c429e502d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java Improve a log message in AppSchedulingInfo by adding application id --- Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Fix For: 2.7.0 Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322929#comment-14322929 ] Devaraj K commented on YARN-3197: - {code:xml} protected synchronized void completedContainer(RMContainer rmContainer, ContainerStatus containerStatus, RMContainerEventType event) { if (rmContainer == null) { LOG.info(Null container completed...); return; } {code} Here this log can be updated with containerId from ContainerStatus along with the some meaningful message. Confusing log generated by CapacityScheduler Key: YARN-3197 URL: https://issues.apache.org/jira/browse/YARN-3197 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Varun Saxena Priority: Minor 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:39,968 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... 2015-02-12 20:35:40,960 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1299: - Fix Version/s: 2.7.0 Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Fix For: 2.7.0 Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323076#comment-14323076 ] Craig Welch commented on YARN-2495: --- My point is that everything necessary to manage labels properly exists without DECENTRALIZED_CONFIGURATION_ENABLED, it is a duplication of existing functionality. The user controls this by: 1. choosing to specify or not specify a way of managing the nodes at the node manager 2. choosing to set or not set node labels and associations using the centralized apis ergo, DECENTRALIZED_CONFIGURATION_ENABLED is completely redundant, it provides no capabilities not already present. Users will need to understand how the feature works to use it effectively anyway, there is no value add by requiring that they repeat themselves (both by specifying a way of determining node labels at the node manager level and by having to set this switch.). My prediction is that, if the switch is present, it's chief function will be to confuse and annoy users when they setup a configuration for the node managers to generate node labels and then the labels don't appear in the cluster as they expect them to. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-65) Reduce RM app memory footprint once app has completed
[ https://issues.apache.org/jira/browse/YARN-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-65: - Assignee: Devaraj K Reduce RM app memory footprint once app has completed - Key: YARN-65 URL: https://issues.apache.org/jira/browse/YARN-65 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Devaraj K The ResourceManager holds onto a configurable number of completed applications (yarn.resource.max-completed-applications, defaults to 1), and the memory footprint of these completed applications can be significant. For example, the {{submissionContext}} in RMAppImpl contains references to protocolbuffer objects and other items that probably aren't necessary to keep around once the application has completed. We could significantly reduce the memory footprint of the RM by releasing objects that are no longer necessary once an application completes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3076) YarnClient implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3076: --- Attachment: YARN-3076.003.patch YarnClient implementation to retrieve label to node mapping --- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3194) After NM restart,completed containers are not released which are sent during NM registration
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3194: - Attachment: 0001-yarn-3194-v1.patch Attached the version-1 patch. The patch does following # Added ReconnectedEvent to process NMContainerStatus if applications are running on the node Kindly review the patch After NM restart,completed containers are not released which are sent during NM registration Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Attachments: 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM process only ContainerState.RUNNING. If container is completed when NM was down then those containers resources wont be release which result in applications to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) [Data Model] create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323157#comment-14323157 ] Naganarasimha G R commented on YARN-3041: - # After having HierarchicalTimelineEntity do we require isRelatedToEntities relatesToEntities in TimelineEntity or vice versa? {quote} private SetTimelineEntity isRelatedToEntities = new HashSet(); private SetTimelineEntity relatesToEntities = new HashSet(); {quote} # If any Entity data cannot be updated on subsequent posts of time line Entities better to capture it before hand. for example if we are inserting configs of timeline entity only during creation of new TimelineEntity... # Regarding metrics ; TimelineEntity has set of TimelineMetric and TimelineMetric has {quote} private String id; private MapString, Object info = new HashMap(); private Object singleData; private MapLong, Object timeSeries = new LinkedHashMap(); {quote} #* whats the purpose of info ? can we name it as metadata ? #* Are all objects stored in backend by serialization and deserialization or as json strings ? #* If metric value is object then how aggregations (primary) are done ? is info responsible for capturing this information? #* IIUC if time series metric then {{singleData}} will be null and {{timeSeries}} will have values and if its non time series then vice versa. [Data Model] create the ATS entity/event API Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs
[ https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2031: - Attachment: YARN-2031-002.patch This is an iteration which implements part of the feature; not complete but posted for interim review. # The AmIPFilter now redirects with the relevant verb, as tested # The proxy is lined up for it, except that it still only registers support for GET. # the redirect code in ProxyUtils is now method aware. There's some complexity in the proxy related to policy related to redirects to YARN pages user click throughs. h3. Click throughs: How to handle the click through warning on non GET operations. Current policy: reject with 401. The warn logic could also probe the accepted types of the GET, and 401 on anything that wanted XML or JSON, so app apis would fail fast. Thoughts? h3. redirecting to RM pages vs app pages RM pages are:the app-not-registered redirect to the RM page, or the app completed redirect to the logs For GET operations, these are redirected as today, with a 302. For other verbs, a 404 on the original URL Is being returned. This is designed to fail when an app isn't running, either not-started or completed. YARN Proxy model doesn't support REST APIs in AMs - Key: YARN-2031 URL: https://issues.apache.org/jira/browse/YARN-2031 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2031-002.patch, YARN-2031.patch.001 AMs can't support REST APIs because # the AM filter redirects all requests to the proxy with a 302 response (not 307) # the proxy doesn't forward PUT/POST/DELETE verbs Either the AM filter needs to return 307 and the proxy to forward the verbs, or Am filter should not filter a REST bit of the web site -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323124#comment-14323124 ] Junping Du commented on YARN-914: - Thanks [~jlowe] for review and comments! bq. Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS? Sounds good. Will update it later. bq. We should remove its available (not total) resources from the cluster then continue to remove available resources as containers complete on that node. That's a very good point. Yes. we should update resource in this way. bq. As for the UI changes, initial thought is that decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). Make sense. Will add to proposal and can discuss more details on UI JIRA later. bq. For the NM restart open question, this should no longer an issue now that the NM is unaware of graceful decommission. Right. bq. For the AM dealing with being notified of decommissioning, again I think this should just be treated like a strict preemption for the short term. IMHO all the AM needs to know is that the RM is planning on taking away those containers, and what the AM should do about it is similar whether the reason for removal is preemption or decommissioning. bq. Back to the long running services delaying decommissioning concern, does YARN even know the difference between a long-running container and a normal container? I am afraid not now. YARN-1039 should be a start to do the differentiation. bq. If it doesn't, how is it supposed to know a container is not going to complete anytime soon? Even a normal container could run for many hours. It seems to me the first thing we would need before worrying about this scenario is the ability for YARN to know/predict the expected runtime of containers. I think prediction of expected runtime of containers could be hard in YARN case. However, can we typically say long running service containers are expected to run very long or infinite? If so, notifying AM to preempt containers of LRS make more sense here than waiting here for timeout. Isn't it? bq. There's still an open question about tracking the timeout RM side instead of NM side. Sounds like the NM side is not going to be pursued at this point, and we're going with no built-in timeout support in YARN for the short-term. That was unclear at the beginning of discussion but much clear now, will remove this part. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs
[ https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323137#comment-14323137 ] Hadoop QA commented on YARN-2031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699147/YARN-2031-002.patch against trunk revision 814afa4. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6646//console This message is automatically generated. YARN Proxy model doesn't support REST APIs in AMs - Key: YARN-2031 URL: https://issues.apache.org/jira/browse/YARN-2031 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2031-002.patch, YARN-2031.patch.001 AMs can't support REST APIs because # the AM filter redirects all requests to the proxy with a 302 response (not 307) # the proxy doesn't forward PUT/POST/DELETE verbs Either the AM filter needs to return 307 and the proxy to forward the verbs, or Am filter should not filter a REST bit of the web site -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323154#comment-14323154 ] Naganarasimha G R commented on YARN-3034: - Hi [~sjlee0] [~zjshen], thanks for reviewing the patch, bq. If aggregator is able to handle the requests in the async way, I'm okay to use rmcontext dispatcher. Otherwise, let's make sure at least we're using a separate async dispatcher. +1 for having separate async dispatcher as anyway we are not planning to handle container events in RM bq. this creates a dependency from RM to the timeline service; perhaps it is unavoidable... Based on the discussions we had on the last week I understand that RM and NM should not be directly dependent on TimelineService . But based on 3030 patch, BaseAggregatorService.java is in timeline service project hence where to place this RMTimelineAggregator.java class (as it extends BaseAggregatorService ) ? If we plan to handle similar to current approach i.e send the Entity data through a rest client to a timeline writer service(RMTimelineAggregator), where should this service be running i.e. as part of which process or should it be a daemon on its own? Other queries : # Is RMTimelineAggregator is expected to do any primary (preliminary) aggregation of some metrics ? Just wanted to the know reason to have a specific TimeLineAggregator for RM separately? Similarly for NM/Applications too, what if there are no primary aggregations and just want to push the entity data to ATS, in these cases do we require separate services handling for per app ? # User and Queue Entities have been newly added in the 3041 Datamodel proposal: IIUC RM needs to add User and Queue Entities when application is created if the specified user and queue doesnt exist as entity in ATS ? Apart from this Queue Entity has Parent Queue information, is it something like when CS/FS is initialized we need to create Entities for new queues and hierarcies ? Is it not sufficent to just have for Leaf Queue Entity and just have parent path as its meta info, is hierarchy req? Based on clarification on these points, i can rework on the patch along with fixing for other small issues. [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1514: - Attachment: YARN-1514.7.patch * Fixing the bug of state management(cleanup works well). * Removing ZK_TIMEOUT_MS. * Using ContainerId.newContainerId instead of ContainerId.newInstance. * Fixing up default values more naturally: ZK_PERF_NUM_APP_DEFAULT is 1000, ZK_PERF_NUM_APPATTEMPT_PER_APP is 10. About the excessive log messages, it can be suppressed with hadoop --loglevel option: {code} $ bin/hadoop --loglevel fatal jar ../../../hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/hadoop-yarn-server-resourcemanager-3.0.0-SNAPSHOT-tests.jar TestZKRMStateStorePerf -appSize 5 -appAttemptSize 10 -workingZnode /Test3 ZKRMStateStore takes 39 msec to loadState. {code} Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3076) YarnClient implementation to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323211#comment-14323211 ] Hadoop QA commented on YARN-3076: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699124/YARN-3076.003.patch against trunk revision 447bd7b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.conf.TestJobConf Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6645//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6645//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6645//console This message is automatically generated. YarnClient implementation to retrieve label to node mapping --- Key: YARN-3076 URL: https://issues.apache.org/jira/browse/YARN-3076 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3076.001.patch, YARN-3076.002.patch, YARN-3076.003.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2832) Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors
[ https://issues.apache.org/jira/browse/YARN-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323092#comment-14323092 ] Devaraj K commented on YARN-2832: - Nice catch [~tianyin]. Thanks for your contribution. The patch looks good to me except these comments. - can you change the log level to INFO and log message similar to the one in NodeHealthScriptRunner.serviceStart(). {code} LOG.info(Not starting node health monitor); {code} - and also can you remove the shouldRun() redundant check in NodeHealthScriptRunner.serviceStart(). Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors -- Key: YARN-2832 URL: https://issues.apache.org/jira/browse/YARN-2832 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1, 2.5.1 Environment: Any environment Reporter: Tianyin Xu Attachments: health.check.service.1.patch NodeManager allows users to specify the health checker script that will be invoked by the health-checker service via the configuration parameter, _yarn.nodemanager.health-checker.script.path_ During the _serviceInit()_ of the health-check service, NM checks whether the parameter is set correctly using _shouldRun()_, as follows, {code:title=/* NodeHealthCheckerService.java */|borderStyle=solid} protected void serviceInit(Configuration conf) throws Exception { if (NodeHealthScriptRunner.shouldRun(conf)) { nodeHealthScriptRunner = new NodeHealthScriptRunner(); addService(nodeHealthScriptRunner); } addService(dirsHandler); super.serviceInit(conf); } {code} The problem is that if the parameter is misconfigured (e.g., permission problem, wrong path), NM does not have any log message to inform users which could cause latent errors or mysterious problems (e.g., why my scripts does not work?) I see the checking and printing logic is put in _serviceStart()_ function in _NodeHealthScriptRunner.java_ (see the following code snippets). However, the logic is very wrong. For an incorrect parameter that does not pass the shouldRun check, _serviceStart()_ would never be called because the _NodeHealthScriptRunner_ instance does not have the chance to be created (see the code snippets above). {code:title=/* NodeHealthScriptRunner.java */|borderStyle=solid} protected void serviceStart() throws Exception { // if health script path is not configured don't start the thread. if (!shouldRun(conf)) { LOG.info(Not starting node health monitor); return; } ... } {code} Basically, I think the checking and printing logic should be put in the serviceInit() in NodeHealthCheckerService instead of serviceStart() in NodeHealthScriptRunner. See the attachment for the simple patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3194) After NM restart,completed containers are not released which are sent during NM registration
[ https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323236#comment-14323236 ] Hadoop QA commented on YARN-3194: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699148/0001-yarn-3194-v1.patch against trunk revision 814afa4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6647//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6647//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6647//console This message is automatically generated. After NM restart,completed containers are not released which are sent during NM registration Key: YARN-3194 URL: https://issues.apache.org/jira/browse/YARN-3194 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: NM restart is enabled Reporter: Rohith Assignee: Rohith Attachments: 0001-yarn-3194-v1.patch On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM process only ContainerState.RUNNING. If container is completed when NM was down then those containers resources wont be release which result in applications to hang. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323315#comment-14323315 ] Hadoop QA commented on YARN-1514: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699153/YARN-1514.7.patch against trunk revision 814afa4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6648//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6648//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6648//console This message is automatically generated. Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323700#comment-14323700 ] Naganarasimha G R commented on YARN-3040: - Hi [~rkanter] Some queries related to tags and this jira # IIUC, users create externally the Flow, Flowrun Entities and just give these id's as tags @ the time of app submission. so during creation of the app we ensure Hierarchies are updated properly. If my understanding is correct then whats the way user can create Flow, Flow run Cluster ? Or is it all the data related to the Flow, Flow run Cluster is passed as part of tags and if its not present we need to create the entities for them @ the time of app submission ? # Hopefully limitation of size (100 chars ) and ascii char only support only, by tags should not be a concern for passing the information to Yarn but better to capture this if we are considering tags as interface for passing flow and flow run information. # IMHO i would have liked to have explicit interface for clients to pass these information rather than tags. As even though tags might serve the purpose but doesn't seem like graceful interface for clients. [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2820: Description: Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} It will be better to Improve FileSystemRMStateStore to configure dfs.client.block.write.locateFollowingBlock.retries to a bigger value for better error recovery. The default value for dfs.client.block.write.locateFollowingBlock.retries is 5. {code} public static final int DFS_CLIENT_BLOCK_WRITE_LOCATEFOLLOWINGBLOCK_RETRIES_DEFAULT = 5; {code} was: Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2820: Description: Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} It will be better to Improve FileSystemRMStateStore to configure dfs.client.block.write.locateFollowingBlock.retries to a bigger value for better error recovery. The default value for dfs.client.block.write.locateFollowingBlock.retries is 5. {code} public static final int DFS_CLIENT_BLOCK_WRITE_LOCATEFOLLOWINGBLOCK_RETRIES_DEFAULT = 5; {code} was: Improve FileSystemRMStateStore to do retrying for better error recovery when update/store failure due to IOException from HDFS. As discussed at YARN-1778, TestFSRMStateStore failure is also due to IOException from HDFS in storeApplicationStateInternal. We will address YARN-1778 in this JIRA also. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at
[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup
[ https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323321#comment-14323321 ] Xuan Gong commented on YARN-2261: - Thanks for the comments. Steve. bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore max...I'd advocate less mempory, but if pmem limits are turned on that's dangerous. bq. would there be any actual/best effort offerings of the interval between AM termination and clean up scheduling? I thought about this. * request the resource for clean-up container separately after the application is finished/failed/killed. In this case, the clean-up container can has its own resource requirement. As vinod's comment, Cleanup container may not get resources because cluster may have gotten busy after the final AM exit. * request the resource for the clean-up container at the same time when we request resource for AM container. And we can reserve the resource for the clean-up container, after the final AM exists, we use this reserved resource to launch the clean-up container. In this case, the clean-up container can has its own resource requirement. But this option is not ideal. Because AM does not know whether it is the final. Even the RM does not know whether the current attempt is the final or not. RM only knows whether the previous attempt is final when it decides whether need to launch the next attempt. So, we need to request the resource for clean-up container every-time when we request resource for AM container. If current AM container is not the final, we will waste the resource. * reuse the AM container resource as I proposed. If we have the feature (resize the container resource) ready, we could definitely let clean-up container has its own resource requirement. Those are all the options that I can think for clean-up container scheduling, and that is why I propose that we can just reuse the AM container resource. bq. My token concern is related to long lived apps: what tokens will they get/? Currently, we could just give all the latest tokens which the AM has. I understand that for LRS apps, this is not enough. But i think that AM has the similar issue for the token renew/token update issue, we could fix those together. bq. How does this mix up with pre-emption? This is a good point. The resource for clean-up container still belongs to the application's resource. I think that we could do: * if the container is clean-up container, we can not pre-empt it OR * if the clean-up container is pre-empted, we can just simply stop the clean-up process without retry, and mark as clean-up failure. YARN should have a way to run post-application cleanup -- Key: YARN-2261 URL: https://issues.apache.org/jira/browse/YARN-2261 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli See MAPREDUCE-5956 for context. Specific options are at https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323776#comment-14323776 ] Hadoop QA commented on YARN-2123: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699209/YARN-2123-001.patch against trunk revision 9729b24. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6649//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6649//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6649//console This message is automatically generated. Progress bars in Web UI always at 100% (likely due to non-US locale) Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: YARN-2123-001.patch, screenshot.png In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323788#comment-14323788 ] Tsuyoshi OZAWA commented on YARN-2820: -- [~xgong] [~zxu] Oh, I overlooked the point. Good point, Xuan. My first suggestion is to use [DFS-level retry|https://issues.apache.org/jira/browse/YARN-1778?focusedCommentId=14319725page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14319725], but if we support generic filesystems which are not related to HDFS, it looks better to implement RMStateStore-level retry as [~zxu] suggested firstly. Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} It will be better to Improve FileSystemRMStateStore to configure dfs.client.block.write.locateFollowingBlock.retries to a bigger value for better error recovery. The default value for
[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323792#comment-14323792 ] Hadoop QA commented on YARN-2820: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699212/YARN-2820.001.patch against trunk revision 9729b24. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6650//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6650//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6650//console This message is automatically generated. Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323548#comment-14323548 ] Tsuyoshi OZAWA commented on YARN-1514: -- findbugs and test failure look not related to the patch. [~jianhe], could you take a look? Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA Key: YARN-1514 URL: https://issues.apache.org/jira/browse/YARN-1514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, YARN-1514.wip.patch ZKRMStateStore is very sensitive to ZNode-related operations as discussed in YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is called when RM-HA cluster does failover. Therefore, its execution time impacts failover time of RM-HA. We need utility to benchmark time execution time of ZKRMStateStore#loadStore as development tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-3025: - Attachment: yarn-3025-v3.txt work in progress: need to add the PBImpl classes. Provide API for retrieving blacklisted nodes Key: YARN-3025 URL: https://issues.apache.org/jira/browse/YARN-3025 Project: Hadoop YARN Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt We have the following method which updates blacklist: {code} public synchronized void updateBlacklist(ListString blacklistAdditions, ListString blacklistRemovals) { {code} Upon AM failover, there should be an API which returns the blacklisted nodes so that the new AM can make consistent decisions. The new API can be: {code} public synchronized ListString getBlacklistedNodes() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3203) Correct the log message #AuxServices.java
Brahma Reddy Battula created YARN-3203: -- Summary: Correct the log message #AuxServices.java Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322799#comment-14322799 ] Hudson commented on YARN-2749: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2038 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2038/]) YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. (Contributed by Xuan Gong) (junping_du: rev ab0b958a522d502426b91b6e4ab6dd29caccc372) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-1299: Attachment: YARN-1299.patch Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3203: --- Attachment: YARN-3203.patch Correct the log message #AuxServices.java - Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3203: --- Priority: Minor (was: Major) Correct the log message #AuxServices.java - Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3192) Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java
[ https://issues.apache.org/jira/browse/YARN-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322711#comment-14322711 ] Brahma Reddy Battula commented on YARN-3192: {quote} Signalling a clean shutdown is the desired action here, not exiting with a -1. Note also our use of the sole exit mechanism we allow in the Hadoop codebase, via a call to ExitUtil.terminate(-1, t);. That's new to branch-2+ as of this week; until then the code was errant. if you're going to touch join(), rather than have it throw, have it exit with a boolean to indicate managed shutdown vs interruption. It'll be ignored either way, but if it makes you confident the code is better, then I wont say known. {quote} +1, for this approach.. Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java Key: YARN-3192 URL: https://issues.apache.org/jira/browse/YARN-3192 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: YARN-3192.patch The InterruptedException is completely ignored. As a result, any events causing this interrupt will be lost. File: org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java {code} try { event = eventQueue.take(); } catch (InterruptedException e) { LOG.error(Returning, interrupted : + e); return; // TODO: Kill RM. } {code} File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java {code} public void join() { if(proxyServer != null) { try { proxyServer.join(); } catch (InterruptedException e) { } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3203) Correct the log message #AuxServices.java
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322757#comment-14322757 ] Hadoop QA commented on YARN-3203: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699095/YARN-3203.patch against trunk revision ab0b958. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6643//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6643//console This message is automatically generated. Correct the log message #AuxServices.java - Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned YARN-2123: --- Assignee: Akira AJISAKA Progress bars in Web UI always at 100% (likely due to non-US locale) Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Priority: Minor Attachments: screenshot.png In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2123: Attachment: YARN-2123-001.patch Attaching a patch to use {{String.format(Locale.US, format, objects)}} instead of {{String.format(format, objects)}}. I grepped %.1f and %.2f in yarn source code and fixed them. Progress bars in Web UI always at 100% (likely due to non-US locale) Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Priority: Minor Attachments: YARN-2123-001.patch, screenshot.png In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)
[ https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2123: Priority: Major (was: Minor) Progress bars in Web UI always at 100% (likely due to non-US locale) Key: YARN-2123 URL: https://issues.apache.org/jira/browse/YARN-2123 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.3.0 Reporter: Johannes Simon Assignee: Akira AJISAKA Attachments: YARN-2123-001.patch, screenshot.png In our cluster setup, the YARN web UI always shows progress bars at 100% (see screenshot, progress of the reduce step is roughly at 32.82%). I opened the HTML source code to check (also see screenshot), and it seems the problem is that it uses a comma as decimal mark, where most browsers expect a dot for floating-point numbers. This could possibly be due to localized number formatting being used in the wrong place, which would also explain why this bug is not always visible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323697#comment-14323697 ] Zhijie Shen commented on YARN-3166: --- Records is better to be in {{org.apache.hadoop.yarn.api.records.timelineservice.*}}? [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) [Data Model] create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323706#comment-14323706 ] Naganarasimha G R commented on YARN-3041: - Few other minor comments : # flow version is not captured as class member of FlowEntity # For FlowEntity bq. ACCEPTABLE_ENTITY_TYPES.add(ApplicationEntity.TYPE); Is this valid ? i was under the assumption that only FlowRun and cluster will be having ApplicationEntity as child [Data Model] create the ATS entity/event API Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to do retrying for better error recovery when update/store failure.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2820: Attachment: YARN-2820.001.patch Improve FileSystemRMStateStore to do retrying for better error recovery when update/store failure. -- Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch Improve FileSystemRMStateStore to do retrying for better error recovery when update/store failure due to IOException from HDFS. As discussed at YARN-1778, TestFSRMStateStore failure is also due to IOException from HDFS in storeApplicationStateInternal. We will address YARN-1778 in this JIRA also. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} The IOexception from YARN-1778 is {code} 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still not started at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2128) at org.apache.hadoop.ipc.Client.call(Client.java:1474) at org.apache.hadoop.ipc.Client.call(Client.java:1405) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy23.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323719#comment-14323719 ] zhihai xu commented on YARN-1778: - Hi [~ozawa], I uploaded a new patch at YARN-2820. Could you review it? thanks TestFSRMStateStore fails on trunk - Key: YARN-1778 URL: https://issues.apache.org/jira/browse/YARN-1778 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: zhihai xu Attachments: YARN-1778.000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323732#comment-14323732 ] Xuan Gong commented on YARN-2820: - [~zxu], [~ozawa] Thanks for working on this. I understand the problem. But I am not sure whether this is a good idea to do it. For using FileSystemRMStateStore, we are depended on the underly FileSystem (either HDFS or other distributed system). I think that we should be consistent with the configurations set for the FS. By changing the configuration, it will make it un-consistent. Do you think that is a good idea ? Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. Key: YARN-2820 URL: https://issues.apache.org/jira/browse/YARN-2820 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2820.000.patch, YARN-2820.001.patch Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery. When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw the following IOexception cause the RM shutdown. {code} 2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Updating info for attempt: appattempt_1409135750325_109118_01 at: /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ appattempt_1409135750325_109118_01.new.tmp retrying... 2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: Error updating info for attempt: appattempt_1409135750325_109118_01 java.io.IOException: Unable to close file because the last block does not have enough number of replicas. 2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error storing/updating appAttempt: appattempt_1409135750325_109118_01 2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: java.io.IOException: Unable to close file because the last block does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} It will be better to Improve FileSystemRMStateStore to configure dfs.client.block.write.locateFollowingBlock.retries to a bigger value for better error recovery. The default value for
[jira] [Created] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.
zhihai xu created YARN-3205: --- Summary: FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. Key: YARN-3205 URL: https://issues.apache.org/jira/browse/YARN-3205 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration. The old configuration may not have all these customized DFS_CLIENT configurations for FileSystemRMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1299: - Assignee: Devaraj K Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Assignee: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1299: - Assignee: (was: Devaraj K) Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3203) Correct the log message #AuxServices.java
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322826#comment-14322826 ] Tsuyoshi OZAWA commented on YARN-3203: -- +1, committing this shortly. Correct the log message #AuxServices.java - Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3203) Correct the log message in AuxServices
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3203: - Summary: Correct the log message in AuxServices (was: Correct the log message #AuxServices.java) Correct the log message in AuxServices -- Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3203: - Issue Type: Improvement (was: Bug) Correct the log message #AuxServices.java - Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id
[ https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322823#comment-14322823 ] Tsuyoshi OZAWA commented on YARN-1299: -- +1, pending for Jenkins. Improve 'checking for deactivate...' log message by adding app id - Key: YARN-1299 URL: https://issues.apache.org/jira/browse/YARN-1299 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.1.1-beta Reporter: Devaraj K Attachments: YARN-1299.patch, yarn-1299.patch {code:xml} 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... 2013-10-07 19:28:35,365 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate... {code} In RM log, it gives message saying 'checking for deactivate...'. It would give better meaning if this log message contains app id. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3203) Correct a log message in AuxServices
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-3203: - Summary: Correct a log message in AuxServices (was: Correct the log message in AuxServices) Correct a log message in AuxServices Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322831#comment-14322831 ] Jagadesh Kiran N commented on YARN-3195: Hi Devaraj K, Thanks for your review ,please find my anlaysis below Not considered the Yarn commands which are direct executables doesnt require help. Ex : ./yarn classpath xxx or ./yarn version Please check below for inconsitency *Help : present for these below commands* ./yarn container :-help is present Displays help when run -help command ./yarn rmadmin : -help is present Displays help when run -help command ./yarn application : -help is present Displays help when run -help command ./yarn applicationattempt :-help is present Displays help when run -help command ./yarn queue : -help is present Displays help when run this command *Help :Not present for these below commands* ./yarn :-help is missing ./yarn -help : Displays the help ./yarn node : -help is missing ,./yarn node -help throws exception Unrecognized option: -help ./yarn logs : -help is not present ,./yarn logs -help :Displays the help ./yarn deamonlog :-help is not present ,./yarn deamonlog -help :Displays the help * For these Not present areas i want to add help ,please check and confirm so that i can go a head * [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322839#comment-14322839 ] Allen Wittenauer commented on YARN-3168: Both. I start with a bunch of scripts I wrote + doxia-converter and then do a manual pass over it. Then I upload the patch, let someone else (usually the very awesome [~iwasakims]) do a second manual pass over it to fix the things I missed. Then I'll review it and commit it as appropriate, knowing that we can always go back and fix things in subsequent JIRAs since this is for trunk and not for branch-2. Keep in mind that *any delay* results in the source changing and the patch won't be able to applied and I'm out of town at the moment and won't be able to generate a new patch until next week. Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3203) Correct a log message in AuxServices
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322862#comment-14322862 ] Hudson commented on YARN-3203: -- FAILURE: Integrated in Hadoop-trunk-Commit #7118 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7118/]) YARN-3203. Correct a log message in AuxServices. Contributed by Brahma Reddy Battula. (ozawa: rev 447bd7b5a61a5788dc2a5d29cedfc19f0e99c0f5) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java Correct a log message in AuxServices Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.7.0 Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3203) Correct a log message in AuxServices
[ https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322866#comment-14322866 ] Brahma Reddy Battula commented on YARN-3203: Thanks a lot [~ozawa]!!! Correct a log message in AuxServices Key: YARN-3203 URL: https://issues.apache.org/jira/browse/YARN-3203 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.7.0 Attachments: YARN-3203.patch Currently log is coming like following.. WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for *{color:red}class class{color}* org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. Since get class will return class as prefix,, we no need keep class in log.. {code} Class? extends AuxiliaryService sClass = conf.getClass( String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null, AuxiliaryService.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322881#comment-14322881 ] Hudson commented on YARN-2749: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2057 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2057/]) YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. (Contributed by Xuan Gong) (junping_du: rev ab0b958a522d502426b91b6e4ab6dd29caccc372) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322879#comment-14322879 ] Hudson commented on YARN-2749: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #107 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/107/]) YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. (Contributed by Xuan Gong) (junping_du: rev ab0b958a522d502426b91b6e4ab6dd29caccc372) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322631#comment-14322631 ] Hudson commented on YARN-2749: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #106 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/106/]) YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. (Contributed by Xuan Gong) (junping_du: rev ab0b958a522d502426b91b6e4ab6dd29caccc372) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322635#comment-14322635 ] Hudson commented on YARN-2749: -- FAILURE: Integrated in Hadoop-Yarn-trunk #840 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/840/]) YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. (Contributed by Xuan Gong) (junping_du: rev ab0b958a522d502426b91b6e4ab6dd29caccc372) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java Some testcases from TestLogAggregationService fails in trunk Key: YARN-2749 URL: https://issues.apache.org/jira/browse/YARN-2749 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch Some testcases from TestLogAggregationService fails in trunk. Those can be reproduced in centos Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command
[ https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322525#comment-14322525 ] Devaraj K commented on YARN-3195: - Thanks [~jagadesh.kiran] for your contribution. I am not sure which commands you are referring for uniformity. There are the commands along with the 'queue' command which support '-help' listed below. {code:xml} yarn application -help yarn applicationattempt -help yarn container -help yarn rmadmin -help yarn scmadmin -help {code} IMO, Removing is not the right thing to do here, instead adding the '-help' to the missing commands would help the users. [YARN]Missing uniformity In Yarn Queue CLI command --- Key: YARN-3195 URL: https://issues.apache.org/jira/browse/YARN-3195 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.6.0 Environment: SUSE Linux SP3 Reporter: Jagadesh Kiran N Assignee: Jagadesh Kiran N Priority: Minor Fix For: 2.7.0 Attachments: Helptobe removed in Queue.png, YARN-3195.patch, YARN-3195.patch Help is generic command should not be placed here because of this uniformity is missing compared to other commands.Remove -help command inside ./yarn queue as uniformity with respect to other commands {code} SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue -help 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn queue 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Invalid Command Usage : usage: queue * -help Displays help for all commands.* -status Queue Name List queue information about given queue. {code} * -help Displays help for all commands.* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
Rohith created YARN-3202: Summary: Improve master container resource release time ICO work preserving restart enabled Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated YARN-3187: - Attachment: YARN-3187.2.patch Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.6.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322607#comment-14322607 ] Hadoop QA commented on YARN-3187: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12699063/YARN-3187.2.patch against trunk revision ab0b958. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6641//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6641//console This message is automatically generated. Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.6.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322610#comment-14322610 ] Gururaj Shetty commented on YARN-3187: -- Please review [~Naganarasimha Garla], [~aw], [~jianhe], [~djp]. Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.6.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group
[ https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322583#comment-14322583 ] Gururaj Shetty commented on YARN-3187: -- I have completed the documentation of User and Queue Mapping. [~Naganarasimha Garla]/[~aw]/[~jianhe]/[~djp] Documentation of Capacity Scheduler Queue mapping based on user or group Key: YARN-3187 URL: https://issues.apache.org/jira/browse/YARN-3187 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler, documentation Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Gururaj Shetty Labels: documentation Fix For: 2.6.0 Attachments: YARN-3187.1.patch, YARN-3187.2.patch YARN-2411 exposes a very useful feature {{support simple user and group mappings to queues}} but its not captured in the documentation. So in this jira we plan to document this feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322604#comment-14322604 ] Gururaj Shetty commented on YARN-3168: -- Thanks [~aw] for the patch. Do you convert the .apt to markdown files manually or do you use any tool to do the same? Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2799) cleanup TestLogAggregationService based on the change in YARN-90
[ https://issues.apache.org/jira/browse/YARN-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322656#comment-14322656 ] Junping Du commented on YARN-2799: -- Latest patch looks good to me. Kick off Jenkins test again (as patch is 2 days ago) and +1 depends on Jenkins result. cleanup TestLogAggregationService based on the change in YARN-90 Key: YARN-2799 URL: https://issues.apache.org/jira/browse/YARN-2799 Project: Hadoop YARN Issue Type: Improvement Components: test Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2799.000.patch, YARN-2799.001.patch, YARN-2799.002.patch cleanup TestLogAggregationService based on the change in YARN-90. The following code is added to setup in YARN-90, {code} dispatcher = createDispatcher(); appEventHandler = mock(EventHandler.class); dispatcher.register(ApplicationEventType.class, appEventHandler); {code} In this case, we should remove all these code from each test function to avoid duplicate code. Same for dispatcher.stop() which is in tearDown, we can remove dispatcher.stop() from from each test function also because it will always be called from tearDown for each test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)