[jira] [Resolved] (YARN-4794) Deadlock in NMClientImpl
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-4794. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 2.8.0 > Deadlock in NMClientImpl > > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, > YARN-4794.2.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238654#comment-15238654 ] Mahens commented on YARN-2624: -- Issue Still persists in YARN 2.7.1 and HDP version is 2.3. We see this issue intermittently. with same above error. Is there any Hadoop command to clear of cache directory ? > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0, 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4794) Deadlock in NMClientImpl
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238653#comment-15238653 ] Rohith Sharma K S commented on YARN-4794: - committed to branch-2.7 also.. thanks [~jianhe] for the patch!! thanks [~vinodkv] for additional review:-) > Deadlock in NMClientImpl > > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, > YARN-4794.2.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4794) Deadlock in NMClientImpl
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238648#comment-15238648 ] Rohith Sharma K S commented on YARN-4794: - +1 lgtm. > Deadlock in NMClientImpl > > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, > YARN-4794.2.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4633) TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk
[ https://issues.apache.org/jira/browse/YARN-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238630#comment-15238630 ] Rohith Sharma K S commented on YARN-4633: - Need not to backport this JIRA. This issue is mainly originated from YARN-4584 which is affected in 2.9 version only. > TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk > - > > Key: YARN-4633 > URL: https://issues.apache.org/jira/browse/YARN-4633 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 2.9.0 > Environment: Jenkin >Reporter: Rohith Sharma K S >Assignee: Bibin A Chundatt > Fix For: 2.9.0 > > Attachments: 0001-YARN-4633.patch > > > Jenkins > [Build|https://builds.apache.org/job/PreCommit-YARN-Build/10366/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt] > failed for below test case, > {code} > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; > support was removed in 8.0 > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 455.808 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartAfterPreemption[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 60.145 sec <<< FAILURE! > java.lang.AssertionError: Attempt state is not correct (timedout): expected: > SCHEDULED actual: FAILED for the application attempt > appattempt_1453461355278_0001_04 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:818) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterPreemption(TestRMRestart.java:2352) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238624#comment-15238624 ] Allen Wittenauer commented on YARN-4734: * Definitely need some clarification from ASF legal whether we can merge licenses like that. My hunch is no, but IANAL. * The dist and tmp directories should be inside target and not in the root of the module. This makes a ton of other problems go away. * Why is there a separate profile for this? What UI do I get if I don't build with this profile? This also means the precommit hooks won't work until the hadoop personality is modified (which means the above precommit testing is mostly useless) * Double check the license headers. At least one of 'em was using the old text. * Why isn't YarnUI2.md's content in BUILDING.txt? Why does an *end user* care about this information? Also, heads up to [~andrew.wang] since he is looking to cut a release off of trunk relatively soon. This may have to get jettisoned before the cut. * The Apache RAT excludes files that don't or shouldn't exist (e.g., travis.yml) * The Apache RAT excludes files that actually have a license. * Why does "hadoop-yarn-ui/src/main/resources/META-INF/NOTICE.txt" mention Tez? Why is this file even there? * hadoop-yarn-ui/src/main/webapp/package.json should have it's version pulled from maven. Let's not repeat past mistakes like we did with libhadoop.so getting some random version number. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch, > YARN-4734.4.patch, YARN-4734.5.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238616#comment-15238616 ] Rohith Sharma K S commented on YARN-4953: - Even though scenario is very rare to happen, thinking one step ahead which would affect applications with large number of containers. Thoughts.?? > Delete completed container log folder when rolling log aggregation is enabled > - > > Key: YARN-4953 > URL: https://issues.apache.org/jira/browse/YARN-4953 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > There would be potential bottle neck when cluster is running with very large > number of containers on the same NodeManager for single application. The > linux limits the subfolders count to 32K. If number of containers is greater > than 32K for an application, there would be container launch failure. At this > point of time, there are no more containers can be launched in this node. > Currently log folders are deleted after app is finished. Rolling log > aggregation aggregates logs to hdfs periodically. > I think if aggregation is completed for finished containers, then clean up > can be done i.e deleting log folder for finished containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled
[ https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238596#comment-15238596 ] Rohith Sharma K S commented on YARN-4953: - cc :/ [~jlowe] > Delete completed container log folder when rolling log aggregation is enabled > - > > Key: YARN-4953 > URL: https://issues.apache.org/jira/browse/YARN-4953 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > There would be potential bottle neck when cluster is running with very large > number of containers on the same NodeManager for single application. The > linux limits the subfolders count to 32K. If number of containers is greater > than 32K for an application, there would be container launch failure. At this > point of time, there are no more containers can be launched in this node. > Currently log folders are deleted after app is finished. Rolling log > aggregation aggregates logs to hdfs periodically. > I think if aggregation is completed for finished containers, then clean up > can be done i.e deleting log folder for finished containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled
Rohith Sharma K S created YARN-4953: --- Summary: Delete completed container log folder when rolling log aggregation is enabled Key: YARN-4953 URL: https://issues.apache.org/jira/browse/YARN-4953 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S There would be potential bottle neck when cluster is running with very large number of containers on the same NodeManager for single application. The linux limits the subfolders count to 32K. If number of containers is greater than 32K for an application, there would be container launch failure. At this point of time, there are no more containers can be launched in this node. Currently log folders are deleted after app is finished. Rolling log aggregation aggregates logs to hdfs periodically. I think if aggregation is completed for finished containers, then clean up can be done i.e deleting log folder for finished containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238478#comment-15238478 ] Daniel Zhi commented on YARN-4676: -- 1. I don't expect it will disappear by next patch but will focus on other issues first. 2. I will revert these two files (I didn't notice them due to my local diff tool skipped empty changes). 3. I will restore the resolve() (it was due to my manual merge). 4. Yes it will simplify the code. 5. refreshNodes(long timeout) basically remains unchanged. The client enforces a timeout which is not fully integrated with the automatic logic in RM side (NodesListManager uses the internal default timeout (3600 seconds)). Given the code checks status every second, it was likely expect a smaller timeout from command line. So the command line timeout experience would be same as before. A deeper integration is to pass the timeout through RefreshNodesRequest to NodesListManager to honor it. The client-side wait-and-check can still be there but no need to FORCEFUL decommission as it supposes to happen automatically. 6. I am surprised that update() no longer throw exception (maybe the code evolved since original version). So I will remove updateNoThrow() (and will log full exception in readDecommissioningTimeout). 7. I will add synchronized. It will be called by every node during every heartbeat. But the implementation is efficient enough to not have contention due to synchronized. 8. Is there a list on what "docs" include? > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238407#comment-15238407 ] Sangjin Lee commented on YARN-3816: --- Thanks [~gtCarrera9] for the quick update! As for the new metric type (i.e. base type + "_" + contributing child entity type), I do see the rationale (or need) to distinguish aggregation coming from different entities. We should still note that the metric would show somewhat awkwardly if we read the applications via queries. Aggregated metrics would look like "MEMORY_YARN_CONTAINER" for example. I'm not quite sure if there would be additional issues. Also, I think we should be real judicious in permitting the aggregation. The most important case should be YARN container-to-app. For per-framework metrics, AMs themselves should handle internal aggregations themselves and simply add to the application, as they usually have the app-level metrics already anyway. That should be the main way to support them. (TimelineMetric.java) - l.244: “accumulated” -> “aggregated”? (AppLevelTimelineCollector.java) - l.126: typo: “teal-time” -> “real-time" (TimelineCollector.java) - l.83, 87: since these methods expose internals of the {{TimelineCollector}} class, I would make them {{protected}} to ensure only subclasses can use them - l. 171: I could suggest one more optimization in terms of memory footprint. If the given entity does not have metrics, then we can/should skip the entire aggregation status step. - l.230: It should be {{putIfAbsent()}}. Otherwise, {{put()}} would simply overwrite the value even if the value exists, and it will result in an incorrect object being used. (ApplicationColumnPrefix.java) - l.214: per comments on the JIRA, this new {{store()}} method should be removed, right? I would encourage others to take a closer look at this too. Thanks! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, > YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues
[ https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238383#comment-15238383 ] Hadoop QA commented on YARN-4878: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 38 unchanged - 0 fixed = 39 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 49s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.webapp.TestRMWithCSRFFilter | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | JDK v1.8.0_77 Timed out junit tests |
[jira] [Commented] (YARN-4366) Fix Lint Warnings in YARN Common
[ https://issues.apache.org/jira/browse/YARN-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238359#comment-15238359 ] Hadoop QA commented on YARN-4366: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 11s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12772862/YARN-4366.001.patch | | JIRA Issue | YARN-4366 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4e74a152a4fb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-4366) Fix Lint Warnings in YARN Common
[ https://issues.apache.org/jira/browse/YARN-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238322#comment-15238322 ] Robert Kanter commented on YARN-4366: - We should verify that this doesn't break anything. As explained in [this StackOverflow|http://stackoverflow.com/questions/5401537/i-have-got-this-warning-non-varargs-call-of-varargs-method-with-inexact-argumen], there's a difference between something like {{cls.getMethod(action, null);}} and something like {{cls.getMethod(action);}}. The latter constructs an empty array while the former is ambiguous if it passes a single {{null}} instance or an array with a single {{null}} element (hence the warning). Unfortunately, besides being reflection, the code is very generic, so it's not straightforward to track down what it's being called on and what those expect here. > Fix Lint Warnings in YARN Common > > > Key: YARN-4366 > URL: https://issues.apache.org/jira/browse/YARN-4366 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4366.001.patch > > > {noformat} > [WARNING] > /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java:[100,45] > non-varargs call of varargs method with inexact argument type for last > parameter; > cast to java.lang.Class for a varargs call > cast to java.lang.Class[] for a non-varargs call and to suppress this > warning > [WARNING] > /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factory/providers/RpcFactoryProvider.java:[62,46] > non-varargs call of varargs method with inexact argument type for last > parameter; > cast to java.lang.Class for a varargs call > cast to java.lang.Class[] for a non-varargs call and to suppress this > warning > [WARNING] > /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factory/providers/RpcFactoryProvider.java:[64,34] > non-varargs call of varargs method with inexact argument type for last > parameter; > cast to java.lang.Object for a varargs call > cast to java.lang.Object[] for a non-varargs call and to suppress this > warning > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238294#comment-15238294 ] Hadoop QA commented on YARN-3150: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 28s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 18m 48s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798401/YARN-3150-YARN-2928.01.patch | | JIRA Issue | YARN-3150 | | Optional Tests | asflicense mvnsite | | uname | Linux e6373e4ee781 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2928 / 3df8b0d | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11057/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch > > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238268#comment-15238268 ] Zhe Zhang commented on YARN-2694: - Thanks a lot Wangda for the clear explanation! I think YARN-4140 is what we need. > Ensure only single node labels specified in resource request / host, and node > label expression only specified when resourceName=ANY > --- > > Key: YARN-2694 > URL: https://issues.apache.org/jira/browse/YARN-2694 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Labels: 2.6.1-candidate > Fix For: 2.7.0, 2.6.1 > > Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, > YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, > YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, > YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, > YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, > YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, > YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, > YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt > > > Currently, node label expression supporting in capacity scheduler is partial > completed. Now node label expression specified in Resource Request will only > respected when it specified at ANY level. And a ResourceRequest/host with > multiple node labels will make user limit, etc. computation becomes more > tricky. > Now we need temporarily disable them, changes include, > - AMRMClient > - ApplicationMasterService > - RMAdminCLI > - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3150: -- Attachment: YARN-3150-YARN-2928.01.patch Posted patch v.1. I created a separate document for v.2, and added a link to this doc from the existing TS doc. Could you please review for correctness and (reasonable) completeness? Since we're still making changes, this doc is not going to be totally complete. It needs to contain the necessary information for people to get started. > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch > > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3150: -- Attachment: TimelineServiceV2.html Documentation in html > [Documentation] Documenting the timeline service v2 > --- > > Key: YARN-3150 > URL: https://issues.apache.org/jira/browse/YARN-3150 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Sangjin Lee > Labels: yarn-2928-1st-milestone > Attachments: TimelineServiceV2.html > > > Let's make sure we will have a document to describe what's new in TS v2, the > APIs, the client libs and so on. We should do better around documentation in > v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues
[ https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-4878: --- Attachment: YARN-4878.002.patch > Expose scheduling policy and max running apps over JMX for Yarn queues > -- > > Key: YARN-4878 > URL: https://issues.apache.org/jira/browse/YARN-4878 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.9.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-4878.001.patch, YARN-4878.002.patch > > > There are two things that are not currently visible over JMX: the current > scheduling policy for a queue, and the number of max running apps. It would > be great if these could be exposed over JMX as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4878) Expose scheduling policy and max running apps over JMX for Yarn queues
[ https://issues.apache.org/jira/browse/YARN-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238150#comment-15238150 ] Yufei Gu commented on YARN-4878: Hi [~kasha], thanks for the review. 1. I tried {{scheduler.allocConf}}. But lots of test cases failed. I figured out and fixed them in the second patch. 2. I did this to mimic the existing code. If we are going to consolidate both of them, can we do it in a followup JIRA? > Expose scheduling policy and max running apps over JMX for Yarn queues > -- > > Key: YARN-4878 > URL: https://issues.apache.org/jira/browse/YARN-4878 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.9.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-4878.001.patch > > > There are two things that are not currently visible over JMX: the current > scheduling policy for a queue, and the number of max running apps. It would > be great if these could be exposed over JMX as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3816: Attachment: YARN-3816-YARN-2928-v6.patch OK v6 version of the patch. Addressed most of Sangjin's comments and removed some unnecessary code. Specially, something I addressed in ways other than Sangjin's suggestions: - I did not move the aggregation logic to app-level collector completely. Instead, I left the code infrastructure in TimelineCollector but moved the logic to launch the aggregation into app-level collector. In this way, we keep the aggregation infrastructure to be a fairly general one for future collectors (like rack level collector proposed by Vinod a while ago) but can have specific designs for app-level aggregations. - With regard to the result of the aggregations, I store them in the application entity with entity id equals to the application id. The id for each of the aggregated metric is the original metric plus the aggregation group. Note that I think we need to keep the "aggregation group" information in the metric id because we may have multiple types of entities all posting the same metric name (especially if there are user-defined metrics posted by the application itself) and we may not want to aggregate them together. - I refactored RealTimeAggregationOperation into TimelineMetricOperations. My intuition here is we can provide a basic framework to define operations between timeline metrics, no matter it's an aggregation operation or accumulation operation. Right now the input of a timeline metric operation is the incoming metric, the existing metric, the previous state. The output should be a new timeline metric and the side effect can be reflected on the state. In this way we can model aggregation operations like SUM, AVG (not supported yet) and accumulation operations like REPLACE and MAX. - I changed the code so that we're not storing the metric aggregation operation. I'll rebuild them for offline aggregations through a config. Will address that in YARN-3817. Right now, this patch lives well with the new filter mechanism. Please do let me know if there are other concerns, thanks! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, > YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237927#comment-15237927 ] Hadoop QA commented on YARN-4950: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 37s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 34s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 101m 44s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | |
[jira] [Updated] (YARN-4951) large IP ranges require the creation of multiple reverse lookup zones
[ https://issues.apache.org/jira/browse/YARN-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-4951: - Attachment: 0001-YARN-4757-address-multiple-reverse-lookup-zones-and-.patch An approach that: 1) Add config properties for netmask, IP range min, and IP range max 2) Uses SubnetUtils to find the list addresses based on subnet and mask, selects the network addresses (end in ".0") for resulting list, and allows for the addresses to be filtered based on range values. > large IP ranges require the creation of multiple reverse lookup zones > - > > Key: YARN-4951 > URL: https://issues.apache.org/jira/browse/YARN-4951 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Maron >Assignee: Jonathan Maron > Attachments: > 0001-YARN-4757-address-multiple-reverse-lookup-zones-and-.patch > > > Large subnet definitions (e.g. specifying a mask value of 255.255.224.0) > yield a large number of potential network addresses, each requiring a > separate reverse zone definition (given that reverse zones include the first > 3 IP bytes in reverse order). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4952) need configuration mechanism for specifying per-host network interface
Jonathan Maron created YARN-4952: Summary: need configuration mechanism for specifying per-host network interface Key: YARN-4952 URL: https://issues.apache.org/jira/browse/YARN-4952 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Maron Assignee: Jonathan Maron The initial configuration approach for the DNS service specified a bind-address that designated the network interface to which the service should bind its listener port. However, there is a need to potentially specify multiple DNS service instances (HA approach) and therefore a need to specify bind addresses for each instance (and those interfaces may vary between hosts). This may take a for similar to the RM HA approach (rm1, rm2) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237761#comment-15237761 ] Allen Wittenauer edited comment on YARN-4950 at 4/12/16 6:50 PM: - -00: * naive copy parallel-tests from hadoop-common's pom.xml into yarn-client and yarn-server-resourcemanager was (Author: aw): -00: * copy parallel-tests from hadoop-common's pom.xml into yarn-client and yarn-server-resourcemanager > configure parallel-tests for yarn-client and yarn-server-resourcemanager > > > Key: YARN-4950 > URL: https://issues.apache.org/jira/browse/YARN-4950 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: YARN-4950.00.patch > > > Unit tests for yarn-client and yarn-server-resourcemanager take over an hour > each. The parallel-tests profile should be configured to reduce the > execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4950: --- Attachment: YARN-4950.00.patch -00: * copy parallel-tests from hadoop-common's pom.xml into yarn-client and yarn-server-resourcemanager > configure parallel-tests for yarn-client and yarn-server-resourcemanager > > > Key: YARN-4950 > URL: https://issues.apache.org/jira/browse/YARN-4950 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: YARN-4950.00.patch > > > Unit tests for yarn-client and yarn-server-resourcemanager take over an hour > each. The parallel-tests profile should be configured to reduce the > execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4951) large IP ranges require the creation of multiple reverse lookup zones
Jonathan Maron created YARN-4951: Summary: large IP ranges require the creation of multiple reverse lookup zones Key: YARN-4951 URL: https://issues.apache.org/jira/browse/YARN-4951 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Maron Assignee: Jonathan Maron Large subnet definitions (e.g. specifying a mask value of 255.255.224.0) yield a large number of potential network addresses, each requiring a separate reverse zone definition (given that reverse zones include the first 3 IP bytes in reverse order). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-4757: - Attachment: 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch While I await branch-committer status, I am uploading an initial patch that should give a more concrete sense of a DNS service implementation. I made use of the dnsjava library and implemented a good portion of the specification. I plan to provide relatively frequent updates and record sub-tasks to address any issues unearthed. > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > Attachments: > 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch, YARN-4757- > Simplified discovery of services via DNS mechanisms.pdf > > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of service-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endpoints of a service is not easy to implement > using the present registry-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-known DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4950: --- Affects Version/s: 3.0.0 > configure parallel-tests for yarn-client and yarn-server-resourcemanager > > > Key: YARN-4950 > URL: https://issues.apache.org/jira/browse/YARN-4950 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Critical > > Unit tests for yarn-client and yarn-server-resourcemanager take over an hour > each. The parallel-tests profile should be configured to reduce the > execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4950) configure parallel-tests for yarn-client and yarn-server-resourcemanager
Allen Wittenauer created YARN-4950: -- Summary: configure parallel-tests for yarn-client and yarn-server-resourcemanager Key: YARN-4950 URL: https://issues.apache.org/jira/browse/YARN-4950 Project: Hadoop YARN Issue Type: Test Components: test Reporter: Allen Wittenauer Priority: Critical Unit tests for yarn-client and yarn-server-resourcemanager take over an hour each. The parallel-tests profile should be configured to reduce the execution time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4794) Deadlock in NMClientImpl
[ https://issues.apache.org/jira/browse/YARN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4794: -- Attachment: YARN-4794-branch-2.7.patch branch-2.7 patch attached, [~rohithsharma], could you review ? thanks ! > Deadlock in NMClientImpl > > > Key: YARN-4794 > URL: https://issues.apache.org/jira/browse/YARN-4794 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-4794-branch-2.7.patch, YARN-4794.1.patch, > YARN-4794.2.patch > > > Distributed shell app gets stuck on stopping containers after App completes > with the following exception > {code:title = app log} > 15/12/10 14:52:20 INFO distributedshell.ApplicationMaster: Application > completed. Stopping running containers > 15/12/10 14:52:20 WARN ipc.Client: Exception encountered while connecting to > the server : java.nio.channels.ClosedByInterruptException > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237641#comment-15237641 ] Hadoop QA commented on YARN-4909: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 2s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 48s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 184m 51s {color} | {color:black} {color} | \\ \\ || Reason || Tests || |
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237628#comment-15237628 ] Li Lu commented on YARN-3816: - Thanks for the pointer Sangjin! Sure let's not use column names for aggregation op storage. I can make a config key so that we can rebuild the aggregation operation according to the config. > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237606#comment-15237606 ] Sangjin Lee commented on YARN-3816: --- Sorry I missed the column post-fix part earlier in my review. > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237602#comment-15237602 ] Sangjin Lee commented on YARN-3816: --- We discussed the cases where we may need to support adding more info for the metrics on YARN-4053. Especially see [this comment|https://issues.apache.org/jira/browse/YARN-4053?focusedCommentId=14994603=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14994603] (although going over the full discussion is informative). The conclusion was that it would be good not to store additional metadata as column pre- or post-fixes due to the complications mentioned in YARN-4053. If we can find a way to avoid that here, it would be ideal. If this is to support offline aggregation, options like separate configuration were also discussed. If we end up storing that metadata in HBase, one thing we should *definitely* avoid is the need to read it back to do any writes. We're ruling out doing read-then-write as a principle, otherwise it would open up a world of pain in terms of performance as well as correctness. > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4939) the decommissioning Node should keep alive if NM restart
[ https://issues.apache.org/jira/browse/YARN-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237599#comment-15237599 ] Daniel Templeton commented on YARN-4939: Thanks, [~sandflee]. The patch looks good to me. Could you please add tests to cover the scenario the patch addresses? > the decommissioning Node should keep alive if NM restart > - > > Key: YARN-4939 > URL: https://issues.apache.org/jira/browse/YARN-4939 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4939.01.patch, YARN-4939.02.patch > > > 1, gracefully decommission a node A > 2, restart node A > 3, node A could not register to RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue
[ https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237578#comment-15237578 ] Eric Payne commented on YARN-2113: -- bq. I planned to work on YARN-4781 soon but I'm working on other stuffs so at least I will not be able to work on it in recent 1-2 months. Please feel free to take over. Thanks, [~leftnoteasy]. Sure, I would like to drive this if that's okay. > Add cross-user preemption within CapacityScheduler's leaf-queue > --- > > Key: YARN-2113 > URL: https://issues.apache.org/jira/browse/YARN-2113 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > Preemption today only works across queues and moves around resources across > queues per demand and usage. We should also have user-level preemption within > a queue, to balance capacity across users in a predictable manner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4949) [YARN-3368] Support pagination for RM/ATS Web UI applications page.
Rohith Sharma K S created YARN-4949: --- Summary: [YARN-3368] Support pagination for RM/ATS Web UI applications page. Key: YARN-4949 URL: https://issues.apache.org/jira/browse/YARN-4949 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Reporter: Rohith Sharma K S It is obvious that user would expect pagination for applications page in RM/ATS web UI. Old RM/ATS web UI has limitation that these web UI takes lot of time to render applications in browser. It would good to support batch retrieval of applications from server rather than retrieving all the applications from server. This require lot of things to be considered from RMWebApp and RM/ATS server end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237548#comment-15237548 ] Li Lu commented on YARN-3816: - Thanks [~varun_saxena] and [~sjlee0]! My bottomline is we may want to store some metadata for some timeline metrics. How to perform aggregation is one metadata that we want to keep. We need this data so that for offline aggregations, like user and flow level offline aggregation, we can read out the aggregation operation. Is it OK to reserve a separate column for each metric to store their metadata (like _META)? We can skip if their aggregation operation is NOP for now? Thoughts? > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237533#comment-15237533 ] Daniel Templeton commented on YARN-4940: I agree with [~kshukla] on both counts. The fix seems sound, and I like that it gets rid of the extra {{UnknownNodeId}} class. The patch needs to add tests to test the scenario that caused the issue. > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4514: -- Attachment: YARN-4514-YARN-3368.7.patch Fixed asf warnings... > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, > YARN-4514-YARN-3368.6.patch, YARN-4514-YARN-3368.7.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4932) [Umbrella] YARN/MR test failures on Windows
[ https://issues.apache.org/jira/browse/YARN-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4932: -- Summary: [Umbrella] YARN/MR test failures on Windows (was: (Umbrella) YARN/MR test failures on Windows) > [Umbrella] YARN/MR test failures on Windows > --- > > Key: YARN-4932 > URL: https://issues.apache.org/jira/browse/YARN-4932 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Junping Du > > We found several test failures related to Windows. Here is Umbrella jira to > track them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4928) Some yarn.server.timeline.* tests fail on Windows attempting to use a test root path containing a colon
[ https://issues.apache.org/jira/browse/YARN-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237489#comment-15237489 ] Vinod Kumar Vavilapalli commented on YARN-4928: --- [~djp], can this be put on older releases too - 2.8.x, 2.7.x etc? > Some yarn.server.timeline.* tests fail on Windows attempting to use a test > root path containing a colon > --- > > Key: YARN-4928 > URL: https://issues.apache.org/jira/browse/YARN-4928 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 2.8.0 > Environment: OS: Windows Server 2012 > JDK: 1.7.0_79 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Fix For: 2.9.0 > > Attachments: YARN-4928.001.patch, YARN-4928.002.patch, > YARN-4928.003.patch, YARN-4928.004.patch, YARN-4928.005.patch, > YARN-4928.006.patch > > > yarn.server.timeline.TestEntityGroupFSTimelineStore.* and > yarn.server.timeline.TestLogInfo.* fail on Windows, because they are > attempting to use a test root paths like > "/C:/hdp/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/target/test-dir/TestLogInfo", > which contains a ":" (after the Windows drive letter) and > DFSUtil.isValidName() does not accept paths containing ":". > This problem is identical to HDFS-6189, so I suggest to use the same > approach: using "/tmp/..." as test root dir instead of > System.getProperty("test.build.data", System.getProperty("java.io.tmpdir")). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237487#comment-15237487 ] Sangjin Lee commented on YARN-3816: --- I had a similar question to Varun. Is there another way to handle the aggregation operation other than making it part of the column pre/post-fix? > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4633) TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk
[ https://issues.apache.org/jira/browse/YARN-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237443#comment-15237443 ] Vinod Kumar Vavilapalli commented on YARN-4633: --- Slightly old JIRA, but [~bibinchundatt] / [~rohithsharma], is this applicable to 2.8.x / 2.7.x / 2.6.x also? If so, can this be backported / committed to those branches too? Tx. > TestRMRestart.testRMRestartAfterPreemption fails intermittently in trunk > - > > Key: YARN-4633 > URL: https://issues.apache.org/jira/browse/YARN-4633 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 2.9.0 > Environment: Jenkin >Reporter: Rohith Sharma K S >Assignee: Bibin A Chundatt > Fix For: 2.9.0 > > Attachments: 0001-YARN-4633.patch > > > Jenkins > [Build|https://builds.apache.org/job/PreCommit-YARN-Build/10366/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt] > failed for below test case, > {code} > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; > support was removed in 8.0 > Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > Tests run: 54, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 455.808 sec > <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart > testRMRestartAfterPreemption[0](org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart) > Time elapsed: 60.145 sec <<< FAILURE! > java.lang.AssertionError: Attempt state is not correct (timedout): expected: > SCHEDULED actual: FAILED for the application attempt > appattempt_1453461355278_0001_04 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:197) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:172) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForAttemptScheduled(MockRM.java:831) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:818) > at > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartAfterPreemption(TestRMRestart.java:2352) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237426#comment-15237426 ] Hadoop QA commented on YARN-4514: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 51s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 34s {color} | {color:red} Patch generated 5 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 7m 16s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e35bf0f | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798281/YARN-4514-YARN-3368.6.patch | | JIRA Issue | YARN-4514 | | Optional Tests | asflicense | | uname | Linux ec7542f9784e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-3368 / e35bf0f | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/11052/artifact/patchprocess/whitespace-tabs.txt | | asflicense | https://builds.apache.org/job/PreCommit-YARN-Build/11052/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui . U: . | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11052/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, > YARN-4514-YARN-3368.6.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237387#comment-15237387 ] sandflee commented on YARN-4940: thanks [~kshukla], the test failures seems not related, I'll check it later and I'll add a test > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237335#comment-15237335 ] Kuhu Shukla commented on YARN-4940: --- The fix in the patch looks good. I have not looked at all the test failures yet. Just one comment, it might be nice to have a specific test to cover this failure besides the testUnknownNodeId since AFAICT the test did not catch this specific failure. > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237266#comment-15237266 ] Kuhu Shukla commented on YARN-4940: --- Thank you for reporting this [~sandflee]. Did the fix for YARN-4723 not fix the issue for you? > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >
[jira] [Updated] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4514: -- Attachment: YARN-4514-YARN-3368.6.patch Few more changes done: - Updated LICENSE file with changed versions. - Cleaned up code by removing some debug logs added [~leftnoteasy] and [~varun_saxena], pls help to check the patch. > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: YARN-4514-YARN-3368.1.patch, > YARN-4514-YARN-3368.2.patch, YARN-4514-YARN-3368.3.patch, > YARN-4514-YARN-3368.4.patch, YARN-4514-YARN-3368.5.patch, > YARN-4514-YARN-3368.6.patch > > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237251#comment-15237251 ] Hadoop QA commented on YARN-4947: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 58s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 20s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 147m 44s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL |
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237227#comment-15237227 ] Hadoop QA commented on YARN-4940: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 52s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 164m 24s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels | | | hadoop.yarn.webapp.TestRMWithCSRFFilter | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | |
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237183#comment-15237183 ] Bibin A Chundatt commented on YARN-4909: Thanks [~vvasudev]/[~Naganarasimha]/[~sunilg] for looking into the issue.Still thr exists a probability for port to be same but will be very less. Attaching patch for the same. > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4909: --- Attachment: 0005-YARN-4909.patch > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch, 0005-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes
[ https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237156#comment-15237156 ] Varun Vasudev commented on YARN-4006: - [~gss2002] - have you gotten a chance to look at HADOOP-9054 and HADOOP-12082. HADOOP-12082 looks similar to the problem you're trying to solve. With respect to your patch, can you elaborate on how AMs will authenticate with the timeline server? Are you passing the credentials to the AM as part of the job submission? > YARN ATS Alternate Kerberos HTTP Authentication Changes > --- > > Key: YARN-4006 > URL: https://issues.apache.org/jira/browse/YARN-4006 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, timelineserver >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2 >Reporter: Greg Senia >Assignee: Greg Senia > Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch > > > When attempting to use The Hadoop Alternate Authentication Classes. They do > not exactly work with what was built with > https://issues.apache.org/jira/browse/YARN-1935. > I went ahead and made the following changes to support using a Custom > AltKerberos DelegationToken custom class. > Changes to: TimelineAuthenticationFilterInitializer.class > {code} >String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE); > LOG.info("AuthType Configured: "+authType); > if (authType.equals(PseudoAuthenticationHandler.TYPE)) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > PseudoDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler"); > } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || > (UserGroupInformation.isSecurityEnabled() && > conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE))) > { > if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > authType); > LOG.info("AuthType: "+authType); > } else { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > KerberosDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler"); > } > // Resolve _HOST into bind address > String bindAddress = conf.get(HttpServer2.BIND_ADDRESS); > String principal = > filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL); > if (principal != null) { > try { > principal = SecurityUtil.getServerPrincipal(principal, bindAddress); > } catch (IOException ex) { > throw new RuntimeException( > "Could not resolve Kerberos principal name: " + ex.toString(), > ex); > } > filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL, > principal); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237144#comment-15237144 ] Sunil G commented on YARN-4909: --- Yes [~Naganarasimha Garla] We need that to be random number. {{9998 + rnd. nextInt()%500}}, something like this.. So it can be random when called in parallel. However this is not fully solving, there can be corner cases. But probability is very very less. > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237140#comment-15237140 ] Naganarasimha G R commented on YARN-4948: - hi [~wjlei], patch seems to be not compiling can you please check ? > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237134#comment-15237134 ] Naganarasimha G R commented on YARN-4909: - well even if we go for the [~vvasudev]'s solution we cant got go for fixed addition or subraction it has to be random number. > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4810) NM applicationpage cause internal error 500
[ https://issues.apache.org/jira/browse/YARN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237128#comment-15237128 ] Hudson commented on YARN-4810: -- FAILURE: Integrated in Hadoop-trunk-Commit #9597 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9597/]) YARN-4810. NM applicationpage cause internal error 500. Contributed by (naganarasimha_gr: rev 437e9d6475a91cafc4c993b206312912b5f13ad9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMAppsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ApplicationPage.java > NM applicationpage cause internal error 500 > --- > > Key: YARN-4810 > URL: https://issues.apache.org/jira/browse/YARN-4810 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.9.0 > > Attachments: 0001-YARN-4810.patch, 0002-YARN-4810.patch, 1.png, 2.png > > > Use url /node/application/ > *Case 1* > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.webapp.dao.AppInfo.(AppInfo.java:45) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:82) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) > ... 44 more > {noformat} > *Case 2* > {noformat} > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: java.util.NoSuchElementException > at > com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:131) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:126) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.ApplicationPage$ApplicationBlock.render(ApplicationPage.java:79) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) > at > org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) > at > org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.NMController.application(NMController.java:58) > ... 44 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2567) Add a percentage-node threshold for RM to wait for new allocations after restart/failover
[ https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237129#comment-15237129 ] Jason Lowe commented on YARN-2567: -- The problem with delaying or otherwise making the state store operations asynchronous with the state changes they are intended to record is it will always lead to inconsistent recovery if we fail between the state change and the state store operation. IMHO we cannot let the NM registration complete or at least start using the node in a way that is inconsistent with the state as currently recorded in the state store until the state store operation completes. So we might be able to let the node register, but we should not allocate and launch new containers on it until the state store update completes or we end up with the problem described above. In general there needs to be a minimal performance expectation from the state store for a given cluster setup or the RM is going to do some bad things. For example, we can't sustain a situation where applications are being submitted at a rate faster than we can record them to the state store. Similarly for large clusters it's going to be problematic if a large network cut occurs and we need to record the expiration of 1000's of containers but can't do so in a reasonable timeframe. If we tell applications that containers on those nodes are lost _before_ we record the lost node in the state store then if we failover before the node re-joins the new RM instance won't know it's supposed to kill the containers on the rejoining node. AMs probably won't appreciate being told a container has completed only to have it keep running and count against their user limits/headroom in the future. Therefore we have to record the node as lost in the state store before we inform AMs the containers and node are gone. > Add a percentage-node threshold for RM to wait for new allocations after > restart/failover > - > > Key: YARN-2567 > URL: https://issues.apache.org/jira/browse/YARN-2567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > This is the remaining part of YARN-2001 - to halt allocations after restart > till x% of nodes sync back with the RM. This is useful for avoiding bad > scheduling during the time the nodes are still joining back after a > restart/failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237127#comment-15237127 ] Naganarasimha G R commented on YARN-4909: - Hi [~vvasudev], We too had discussions here on the similar lines: bq. Don't undo the JerseyTest options to set a port. If a user has provided a port via the system properties, we should honor it. If we see the current implementation of {{JerseyTestBase.initializeJerseyPort}} we are simply overriding "jerseyPort" by {{System.setProperty("jersey.test.port", Integer.toString(jerseyPort));}} so here too if user has provided the port it gets over ridden by the code. Also would there be possibility for this property to be set by user ? if so then we can do it in this way : overriden {{getport}} can check whether *"jersey.test.port"* is set then use that system property configured port as argument for {{ServerSocketUtil.getPort}} else use {{port}}. bq. I'm not convinced the current patch will fix the issue - it'll probably make the occurrences less frequent. Initially i felt the same but *ServerSocketUtil.getPort"* has been designed to work in that way itself, IIRC Solution was discussed as per this [comment|https://issues.apache.org/jira/browse/YARN-3528?focusedCommentId=14564091=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14564091] and also they wanted to respect allocating to the port which was initially given. As per the test results, test failure propability is less And also what ever approach we take there would be slight possibility that the ports can overlap right ? > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4932) (Umbrella) YARN/MR test failures on Windows
[ https://issues.apache.org/jira/browse/YARN-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4932: - Summary: (Umbrella) YARN/MR test failures on Windows (was: (Umbrella) YARN test failures on Windows) > (Umbrella) YARN/MR test failures on Windows > --- > > Key: YARN-4932 > URL: https://issues.apache.org/jira/browse/YARN-4932 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Junping Du > > We found several test failures related to Windows. Here is Umbrella jira to > track them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237102#comment-15237102 ] Sunil G commented on YARN-4909: --- I agree to [~vvasudev]. Current fix will make this issue less frequent. But not solve 100%. I think there is some recent test case went in which s not closing port, need to dig in. However the new suggested approach looks fine. We can try by adding port +/- 1and can try. > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237083#comment-15237083 ] Varun Vasudev commented on YARN-4909: - Couple of points about the patch - # Don't undo the JerseyTest options to set a port. If a user has provided a port via the system properties, we should honor it. # I'm not convinced the current patch will fix the issue - it'll probably make the occurrences less frequent. The problem is that the point at which we pick the port and the point at which we bind are different. So two tests could both start up, check that 9998 is free and there's a race to bind to it first. What you could do is add/subtract a small random number from 9998 so that two containers that start up together don't go for the same port. What do you think? > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237071#comment-15237071 ] Hadoop QA commented on YARN-4940: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 62 unchanged - 0 fixed = 63 total (was 62) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 56s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 39s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 135m 21s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.webapp.TestRMWithCSRFFilter | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | |
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237043#comment-15237043 ] Bibin A Chundatt commented on YARN-4909: yes, we are running in parallel. {noformat} mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hadoop-trunk-0 -Ptest-patch -Pparallel-tests -P!shelltest -Pnative -Drequire.libwebhdfs -Drequire.snappy -Drequire.openssl -Drequire.fuse -Drequire.test.libhadoop clean test -fae > /testptch/hadoop/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95.txt 2 {noformat} > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237040#comment-15237040 ] Bibin A Chundatt commented on YARN-4947: IIUC {{MockRM#drainEvents()}} will loop infinite .{{GenericEventHandler}} will add event but never drained and Dispatcher is never started .{{isDrained}} will return false always. {noformat} public void await() { while (!isDrained()) { Thread.yield(); } } {noformat} Attaching patch to fix the same > Test timeout is happening for TestRMWebServicesNodes > > > Key: YARN-4947 > URL: https://issues.apache.org/jira/browse/YARN-4947 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4947.patch > > > Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 > [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237038#comment-15237038 ] Varun Vasudev commented on YARN-4909: - TestRMWebServices has been in the code for a long time. What's started these failures? Are we running tests in parallel? > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237032#comment-15237032 ] Hadoop QA commented on YARN-4948: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 14 new + 212 unchanged - 0 fixed = 226 total (was 212) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 14 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 22s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 16s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does
[jira] [Updated] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4947: --- Attachment: 0001-YARN-4947.patch > Test timeout is happening for TestRMWebServicesNodes > > > Key: YARN-4947 > URL: https://issues.apache.org/jira/browse/YARN-4947 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4947.patch > > > Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 > [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237024#comment-15237024 ] Varun Saxena commented on YARN-3816: Had a quick scan of the patch. There seems to be multiple aggregation operations. If we are appending it to a column qualifier and with 4 aggregation operations, we would need to create 4 single column value filters for a single metric i.e. if metric filter says metric1 > 40, we will have to create filter list like metric1=SUM > 40 OR metric1=AVG > 40 OR metric1=NOOP > 40 and so on. Will these aggregation operations be required by Offline aggregation(YARN-3817) ? If yes, can there be some other mechanism to indicate these aggregation operations instead of appending it in the column qualifier ? Configuring it in some way, was a suggestion given earlier. cc [~sjlee0] > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237023#comment-15237023 ] Hadoop QA commented on YARN-4948: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} docker {color} | {color:blue} 0m 5s {color} | {color:blue} Dockerfile '/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/dev-support/docker/Dockerfile' not found, falling back to built-in. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 11m 11s {color} | {color:red} Docker failed to build yetus/hadoop:date2016-04-12. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798246/YARN-4948-branch-2.7.0.001.patch | | JIRA Issue | YARN-4948 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11048/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237021#comment-15237021 ] Naganarasimha G R commented on YARN-4948: - Oops seems like you almost updated the patch @ the same time i changed the status but anyway as the patch is not getting applied to trunk rebase the patch and then change the status, but also IMO i would suggest wait till the interface is developed as part of YARN-4231. [~wangda] can you assign this jira to [~wjlei] and add me to the committers list ? > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4168) Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing
[ https://issues.apache.org/jira/browse/YARN-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237020#comment-15237020 ] Takashi Ohnishi commented on YARN-4168: --- Thank you [~vinodkv] for reviewing and committing! :) > Test TestLogAggregationService.testLocalFileDeletionOnDiskFull failing > -- > > Key: YARN-4168 > URL: https://issues.apache.org/jira/browse/YARN-4168 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Takashi Ohnishi >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4168.1.patch, YARN-4168.2.patch, YARN-4168.3.patch > > > {{TestLogAggregationService.testLocalFileDeletionOnDiskFull}} failing on > [Jenkins build > 1136|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk/1136/testReport/junit/org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation/TestLogAggregationService/testLocalFileDeletionOnDiskFull/] > {code} > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at org.junit.Assert.assertFalse(Assert.java:74) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:229) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:285) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: YARN-4948-branch-2.7.0.001.patch > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: YARN-4948-branch-2.7.0.001.patch, YARN-4948.001.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237007#comment-15237007 ] sandflee commented on YARN-4940: rather than converting UnknownNodeId , using NodeId seems more simple and reasonable. cc [~jlowe] [~kshukla] > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: YARN-4948.001.patch > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: YARN-4948.001.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237005#comment-15237005 ] Bibin A Chundatt commented on YARN-4909: [~Naganarasimha] Thank you for looking into patch . The testcase failures are already tracked as part of umbrella YARN-4478 > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: (was: Node-labels-store-in-zookeeper.patch) > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4940: --- Attachment: YARN-4940.02.patch > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch, YARN-4940.02.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at >
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237000#comment-15237000 ] Naganarasimha G R commented on YARN-4909: - Hi [~bibinchundatt], I am fine with the approach taken, if we dont override *getport* we need have two methods for @Before and @BeforeClass and anyway *JerseyTest.getport* is only taking from the system property and validating whether its configured correctly. And all other test case failures which are reported in the latest report is not related to patch, [~bibinchundatt] please confirm the same. [~wangda], if you are fine with the approach i will go ahead and commit. > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: (was: Node-labels-store-in-zookeeper-2.7.0.patch) > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: Node-labels-store-in-zookeeper.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: Node-labels-store-in-zookeeper.patch > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: Node-labels-store-in-zookeeper.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236949#comment-15236949 ] Hadoop QA commented on YARN-4948: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4948 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12798236/Node-labels-store-in-zookeeper-2.7.0.patch | | JIRA Issue | YARN-4948 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11045/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: Node-labels-store-in-zookeeper-2.7.0.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jialei weng updated YARN-4948: -- Attachment: Node-labels-store-in-zookeeper-2.7.0.patch > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > Attachments: Node-labels-store-in-zookeeper-2.7.0.patch > > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4940) yarn node -list -all failed if RM start with decommissioned node
[ https://issues.apache.org/jira/browse/YARN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4940: --- Attachment: YARN-4940.01.patch > yarn node -list -all failed if RM start with decommissioned node > > > Key: YARN-4940 > URL: https://issues.apache.org/jira/browse/YARN-4940 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4940.01.patch > > > 1, add a node to exclude file > 2, start RM > 3, run yarn node -list -all , see the following exception > {quote} > Exception in thread "main" java.lang.ClassCastException: > org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId > cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:251) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:287) > at > org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:224) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.convertToProtoFormat(GetClusterNodesResponsePBImpl.java:172) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.access$000(GetClusterNodesResponsePBImpl.java:38) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:152) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl$1$1.next(GetClusterNodesResponsePBImpl.java:141) > at > com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323) > at > org.apache.hadoop.yarn.proto.YarnServiceProtos$GetClusterNodesResponseProto$Builder.addAllNodeReports(YarnServiceProtos.java:21485) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.addLocalNodeManagerInfosToProto(GetClusterNodesResponsePBImpl.java:164) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToBuilder(GetClusterNodesResponsePBImpl.java:99) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.mergeLocalToProto(GetClusterNodesResponsePBImpl.java:106) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetClusterNodesResponsePBImpl.getProto(GetClusterNodesResponsePBImpl.java:71) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:284) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:493) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:302) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at >
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236934#comment-15236934 ] Naganarasimha G R commented on YARN-4948: - [~wangda], if fine then we can move this jira as subjira of YARN-2492 > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4948) Support node labels store in zookeeper
[ https://issues.apache.org/jira/browse/YARN-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236931#comment-15236931 ] Naganarasimha G R commented on YARN-4948: - Hi [~wjlei], i think this is very much required, and related to YARN-4881 and also are you having a patch ready for this ? as patch is not attached i am changing the status, if you want to work on it i would suggest to wait for YARN-4231 which exposes a interface and then base your implementation on it. Please inform so that i can assign it to you. cc/ [~wangda] thoughts ? > Support node labels store in zookeeper > -- > > Key: YARN-4948 > URL: https://issues.apache.org/jira/browse/YARN-4948 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: jialei weng > > Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236918#comment-15236918 ] Bibin A Chundatt commented on YARN-4909: Added YARN-4947 to track {{org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes}} timeout > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4948) Support node labels store in zookeeper
jialei weng created YARN-4948: - Summary: Support node labels store in zookeeper Key: YARN-4948 URL: https://issues.apache.org/jira/browse/YARN-4948 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 2.7.0 Reporter: jialei weng Support node labels store in zookeeper -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4947: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4478 > Test timeout is happening for TestRMWebServicesNodes > > > Key: YARN-4947 > URL: https://issues.apache.org/jira/browse/YARN-4947 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 > [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4893) Fix some intermittent test failures in TestRMAdminService
[ https://issues.apache.org/jira/browse/YARN-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236909#comment-15236909 ] Bibin A Chundatt commented on YARN-4893: TestTimeout is happening for TestRMWebServicesNodes , added YARN-4947 to track the same > Fix some intermittent test failures in TestRMAdminService > - > > Key: YARN-4893 > URL: https://issues.apache.org/jira/browse/YARN-4893 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Brahma Reddy Battula >Priority: Blocker > Fix For: 2.8.0 > > Attachments: YARN-4893-002.patch, YARN-4893-003.patch, YARN-4893.patch > > > As discussion in YARN-998, we need to add rm.drainEvents() after > rm.registerNode() or some of test could get failed intermittently. Also, we > can consider to add rm.drainEvents() within rm.registerNode() that could be > more convenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
Bibin A Chundatt created YARN-4947: -- Summary: Test timeout is happening for TestRMWebServicesNodes Key: YARN-4947 URL: https://issues.apache.org/jira/browse/YARN-4947 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236902#comment-15236902 ] Naganarasimha G R commented on YARN-3971: - Thanks for the latest patch [~bibinchundatt], patch LGTM. [~wangda] if you are ok with the approach taken then will go ahead and commit the addendum patch. > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Fix For: 2.8.0 > > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, > 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-4909: -- Assignee: Bibin A Chundatt > Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter > --- > > Key: YARN-4909 > URL: https://issues.apache.org/jira/browse/YARN-4909 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Brahma Reddy Battula >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-YARN-4909.patch, 0002-YARN-4909.patch, > 0003-YARN-4909.patch, 0004-YARN-4909.patch > > > *Precommit link* > https://builds.apache.org/job/PreCommit-YARN-Build/10908/testReport/ > *Trace* > {noformat} > com.sun.jersey.test.framework.spi.container.TestContainerException: > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:463) > at sun.nio.ch.Net.bind(Net.java:455) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384) > at > org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375) > at > org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549) > at > org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326) > at > com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86) > at > com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79) > at > com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342) > at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217) > at > org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:30) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices.(TestRMWebServices.java:125) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4897) dataTables_wrapper change min height
[ https://issues.apache.org/jira/browse/YARN-4897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236801#comment-15236801 ] Bibin A Chundatt commented on YARN-4897: [~rohithsharma] Thank you for review and commit > dataTables_wrapper change min height > > > Key: YARN-4897 > URL: https://issues.apache.org/jira/browse/YARN-4897 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.9.0 > > Attachments: 0001-YARN-4897.patch, Border and DefaultHeight.png > > > Incase of dataTables_wrapper the min height is 302 , Need to set the same to > 10px. > For pages containing 2 tables causes layout problem when min_height=302 > When dataTables_wrapper is in DIV rendering with border at min_height -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4909) Fix intermittent failures of TestRMWebServices And TestRMWithCSRFFilter
[ https://issues.apache.org/jira/browse/YARN-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236778#comment-15236778 ] Hadoop QA commented on YARN-4909: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 58s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 38s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 195m 47s {color} | {color:black} {color} | \\ \\ || Reason || Tests || |
[jira] [Commented] (YARN-2567) Add a percentage-node threshold for RM to wait for new allocations after restart/failover
[ https://issues.apache.org/jira/browse/YARN-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236724#comment-15236724 ] sandflee commented on YARN-2567: there maybe one problem that if NM recovered as a finished state and NM register with running containers, normally we should kill the container. There may some problem as below: 1, NM LOST and RM store LOST status successfully 2, RM failover and NM recovered as LOST 3, NM register and becomes RUNNING, {color:red} but RM stores RUNNING state failed or delayed{color} 4, RM allocate container on NM, and container running on it 5, RM failover and NM recovered as LOST 6, NM register with RM, RM killed the container on it, this is not expected to fix this , one solution is to store NM status first, then NM becomes RUNNING, but this may delay the NM register for big cluster > Add a percentage-node threshold for RM to wait for new allocations after > restart/failover > - > > Key: YARN-2567 > URL: https://issues.apache.org/jira/browse/YARN-2567 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > This is the remaining part of YARN-2001 - to halt allocations after restart > till x% of nodes sync back with the RM. This is useful for avoiding bad > scheduling during the time the nodes are still joining back after a > restart/failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.
[ https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved YARN-3639. Resolution: Duplicate > It takes too long time for RM to recover all apps if the original active RM > and NN go down at the same time. > > > Key: YARN-3639 > URL: https://issues.apache.org/jira/browse/YARN-3639 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Xianyin Xin > Attachments: YARN-3639-recovery_log_1_app.txt > > > If the active RM and NN go down at the same time, the new RM will take long > time to recover all apps. After analysis, we found the root cause is renewing > HDFS tokens in the recovering process. The HDFS client created by the renewer > would firstly try to connect to the original NN, the result of which is > time-out after 10~20s, and then the client tries to connect to the new NN. > The entire recovery cost 15*#apps seconds according our test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.
[ https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved YARN-3639. Resolution: Fixed > It takes too long time for RM to recover all apps if the original active RM > and NN go down at the same time. > > > Key: YARN-3639 > URL: https://issues.apache.org/jira/browse/YARN-3639 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Xianyin Xin > Attachments: YARN-3639-recovery_log_1_app.txt > > > If the active RM and NN go down at the same time, the new RM will take long > time to recover all apps. After analysis, we found the root cause is renewing > HDFS tokens in the recovering process. The HDFS client created by the renewer > would firstly try to connect to the original NN, the result of which is > time-out after 10~20s, and then the client tries to connect to the new NN. > The entire recovery cost 15*#apps seconds according our test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)