[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996545#comment-14996545 ] Hadoop QA commented on YARN-2934: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 13s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 7s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 55s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 34s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 58s {color} | {color:red} Patch generated 4 new checkstyle issues in root (total was 453, now 454). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 2 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 6s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 49s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 46s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 26s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 7s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK
[jira] [Commented] (YARN-4331) Restarting NodeManager leaves orphaned containers
[ https://issues.apache.org/jira/browse/YARN-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996559#comment-14996559 ] Joseph Francis commented on YARN-4331: -- [~jlowe] Setting yarn.nodemanager.recovery.enabled=true does solve the issue with orphaned containers. Note that the SIGKILL was only done locally to emulate few production issues we had that caused nodemanagers to fall over. Thanks very much for your clear explanation! > Restarting NodeManager leaves orphaned containers > - > > Key: YARN-4331 > URL: https://issues.apache.org/jira/browse/YARN-4331 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.7.1 >Reporter: Joseph Francis >Priority: Critical > > We are seeing a lot of orphaned containers running in our production clusters. > I tried to simulate this locally on my machine and can replicate the issue by > killing nodemanager. > I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza > jobs. > Steps: > {quote}1. Deploy a job > 2. Issue a kill -9 signal to nodemanager > 3. We should see the AM and its container running without nodemanager > 4. AM should die but the container still keeps running > 5. Restarting nodemanager brings up new AM and container but leaves the > orphaned container running in the background > {quote} > This is effectively causing double processing of data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart
[ https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996513#comment-14996513 ] Jun Gong commented on YARN-2047: Sorry for the late reply. The issue aims to make sure that a lost NM's containers are marked expired by the RM even across RM restart. What I said aims to solve the problem it caused in another way. Any thought? {quote} If this is a required action then it would also imply that saving a such nodes would be a critical state change operation. So, e.g. decommission command from the admin should not complete until the store has been updated. Is that the case? {quote} Yes, it is. However the store process is often very fast, it might be acceptable. > RM should honor NM heartbeat expiry after RM restart > > > Key: YARN-2047 > URL: https://issues.apache.org/jira/browse/YARN-2047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha > > After the RM restarts, it forgets about existing NM's (and their potentially > decommissioned status too). After restart, the RM cannot maintain the > contract to the AM's that a lost NM's containers will be marked finished > within the expiry time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart
[ https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996539#comment-14996539 ] Jun Gong commented on YARN-2047: Another thought: RM rebuilds containers' information form AMs. When AM re-register with RM, AM tells its running containers' information to RM. Then RM records them in a HashSet *amRunningContainers*, queries them by calling *getRMContainer(containerId)*, and deletes them from *amRunningContainers* if the RMContainer exists. When NM re-register with RM, RM deletes all the containers that NM reports from *amRunningContainers*. After some time(NM expiry time), RM iterates *amRunningContainers*, and tells corresponding AM they have finished. The result seems same as the issue aims. However it needs add or modify AM's register RPC. > RM should honor NM heartbeat expiry after RM restart > > > Key: YARN-2047 > URL: https://issues.apache.org/jira/browse/YARN-2047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha > > After the RM restarts, it forgets about existing NM's (and their potentially > decommissioned status too). After restart, the RM cannot maintain the > contract to the AM's that a lost NM's containers will be marked finished > within the expiry time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3946: Attachment: (was: YARN3946_attemptDiagnistic message.png) > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4331) Restarting NodeManager leaves orphaned containers
[ https://issues.apache.org/jira/browse/YARN-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Francis resolved YARN-4331. -- Resolution: Not A Problem > Restarting NodeManager leaves orphaned containers > - > > Key: YARN-4331 > URL: https://issues.apache.org/jira/browse/YARN-4331 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, yarn >Affects Versions: 2.7.1 >Reporter: Joseph Francis >Priority: Critical > > We are seeing a lot of orphaned containers running in our production clusters. > I tried to simulate this locally on my machine and can replicate the issue by > killing nodemanager. > I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza > jobs. > Steps: > {quote}1. Deploy a job > 2. Issue a kill -9 signal to nodemanager > 3. We should see the AM and its container running without nodemanager > 4. AM should die but the container still keeps running > 5. Restarting nodemanager brings up new AM and container but leaves the > orphaned container running in the background > {quote} > This is effectively causing double processing of data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997478#comment-14997478 ] Jason Lowe commented on YARN-4311: -- Sorry, I couldn't find a reference to {{isUntracked}} in trunk nor in YARN-3223. So not sure if I understand exactly what is being asked. To be consistent with HDFS the node should be gracefully decommissioned if it it appears in the include and exclude list simultaneously, otherwise once it's removed from the include list it's a hard decommission. We could implement a "grace period" where nodes that were removed from the cluster are still "tracked" in the UI for a while before being removed. That may help with some of the potentially confusing cases where a node is accidentally booted from the cluster. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997366#comment-14997366 ] Kuhu Shukla commented on YARN-4311: --- Thank you [~jlowe] for the comments. For graceful refresh nodes, I was looking at YARN-41 and YARN-3223. For this fix, if we remove the node from all lists when isUntracked is true, the decommissioning node falls back to the same behavior as a decommissioned node. Would it be better for both {{refreshNodes}} and {{refreshNodesGracefully}} that if the node is 'untracked' it should be moved to shutdown nodes, irrespective of its previous state and then be taken out of shutdown nodes after a timeout? Let me know if this makes more sense. Thanks! > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997342#comment-14997342 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #658 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/658/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997805#comment-14997805 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2528 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2528/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4051: --- Attachment: YARN-4051.04.patch NM register to RM after all containers are recovered by default, and user could set a timeout vaule. > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch, YARN-4051.04.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1565) Add a way for YARN clients to get critical YARN system properties from the RM
[ https://issues.apache.org/jira/browse/YARN-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997977#comment-14997977 ] Pradeep Subrahmanion commented on YARN-1565: Can anybody help me on how to proceed on this one ? > Add a way for YARN clients to get critical YARN system properties from the RM > - > > Key: YARN-1565 > URL: https://issues.apache.org/jira/browse/YARN-1565 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Steve Loughran > Attachments: YARN-1565-001.patch, YARN-1565-002.patch, > YARN-1565-003.patch, YARN-1565-004.patch > > > If you are trying to build up an AM request, you need to know > # the limits of memory, core for the chosen queue > # the existing YARN classpath > # the path separator for the target platform (so your classpath comes out > right) > # cluster OS: in case you need some OS-specific changes > The classpath can be in yarn-site.xml, but a remote client may not have that. > The site-xml file doesn't list Queue resource limits, cluster OS or the path > separator. > A way to query the RM for these values would make it easier for YARN clients > to build up AM submissions with less guesswork and client-side config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997932#comment-14997932 ] zhangshilong commented on YARN-4325: If permissions with hdfs is right, is there any other problem? If set yarn.log-aggregation-enable = false, does NM recovery work well? > purge app state from NM state-store should be independent of log aggregation > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997974#comment-14997974 ] Hadoop QA commented on YARN-4051: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 265, now 265). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 48s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 46s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 50s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 23m 23s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997807#comment-14997807 ] Naganarasimha G R commented on YARN-4338: - IMHO, i think its worth a try as anyway null is treated as default Label so funcationally its fine. Even if it fails I expect some test cases failing but it will prevent future testcases not require to handle this explicitly. Thoughts ? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997808#comment-14997808 ] Naganarasimha G R commented on YARN-4338: - IMHO, i think its worth a try as anyway null is treated as default Label so funcationally its fine. Even if it fails I expect some test cases failing but it will prevent future testcases not require to handle this explicitly. Thoughts ? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997806#comment-14997806 ] Naganarasimha G R commented on YARN-4338: - IMHO, i think its worth a try as anyway null is treated as default Label so funcationally its fine. Even if it fails I expect some test cases failing but it will prevent future testcases not require to handle this explicitly. Thoughts ? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4341) add doc about timeline performance tool usage
Chang Li created YARN-4341: -- Summary: add doc about timeline performance tool usage Key: YARN-4341 URL: https://issues.apache.org/jira/browse/YARN-4341 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4339) optimize timeline server performance tool
Chang Li created YARN-4339: -- Summary: optimize timeline server performance tool Key: YARN-4339 URL: https://issues.apache.org/jira/browse/YARN-4339 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li As [~Naganarasimha] suggest in YARN-2556 that test could be optimized by having some initial Level DB data before testing the performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4218: --- Attachment: YARN-4218.2.patch > Metric for resource*time that was preempted > --- > > Key: YARN-4218 > URL: https://issues.apache.org/jira/browse/YARN-4218 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, > YARN-4218.patch, YARN-4218.wip.patch, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > After YARN-415 we have the ability to track the resource*time footprint of a > job and preemption metrics shows how many containers were preempted on a job. > However we don't have a metric showing the resource*time footprint cost of > preemption. In other words, we know how many containers were preempted but we > don't have a good measure of how much work was lost as a result of preemption. > We should add this metric so we can analyze how much work preemption is > costing on a grid and better track which jobs were heavily impacted by it. A > job that has 100 containers preempted that only lasted a minute each and were > very small is going to be less impacted than a job that only lost a single > container but that container was huge and had been running for 3 days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4340) Add "list" API to reservation system
Carlo Curino created YARN-4340: -- Summary: Add "list" API to reservation system Key: YARN-4340 URL: https://issues.apache.org/jira/browse/YARN-4340 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Sean Po This JIRA tracks changes to the APIs of the reservation system, and enables querying the reservation system on which reservation exists by "time-range, reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3862: --- Attachment: YARN-3862-YARN-2928.wip.03.patch > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997139#comment-14997139 ] Chang Li commented on YARN-2556: create YARN-4341 to track work of add doc about timeline performance tool usage > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, > YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, > YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, > YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, > YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, > YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997161#comment-14997161 ] Varun Saxena commented on YARN-3862: Attached a WIP patch. This patch attempts to do handling for all the tables(creation of filter list based on fields) and do prefix matching for configs and filters. Previous WIP patch was only attempting to handle for entity table because of the implications of this patch on config and metric filters' matching. Have handled this scenario in this patch. When YARN-3863 is done, some changes will be warranted though(some conditions to pass config and metric filters will have to be removed). Have added a few tests to test the change as well. I have still not hooked up this code to REST API layer. For that, we first need to decide as to whether the TimelineFilter code will be part of our object model or not. For prefix matching of configs and metrics to return, at the REST layer this can simply come as a query param (a comma separated list) But when we code for complex filters (especially metric filters) in YARN-3863 we will have to support SQL type queries with ANDs', ORs', >,<,=operators, etc. If we make TimelineFilter as part of our client object model and interpret filters as a JSON string associated with a query param, we might have to rethink a bit on few of the classes and including additional checks(as this will be used by client). This can increase size of the URL though. If we do not include filter as part of our object model, we will have to decide how to specify complex config and metric filters containing ANDs' and ORs' and different relational operators(because some of the symbols will be reserved) and reach a consensus on that. > Decide which contents to retrieve and send back in response in TimelineReader > - > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4132) Nodemanagers should try harder to connect to the RM
[ https://issues.apache.org/jira/browse/YARN-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996997#comment-14996997 ] Jason Lowe commented on YARN-4132: -- Thanks for updating the patch, Chang! createRMProxy(conf, protocol, instance) should be implemented in terms of createRMProxy(retryTime, retryInterval, conf, protocol, instance) rather than copying the code. It can do the conf lookups to get the retry values and call the other. Then I don't see a need to check for -1 values. ".rm." should be ".resourcemanager.". There's already precedent in the nodemanager.resourcemanager.minimum.version property. Similarly "retry.ms" should be "retry-interval.ms" to be consistent with the existing resourcemanager properties. The added test take a long time to run for just one test (around 25 seconds), please tune down the retry intervals. Style nit: usually extra parameters for a function overload of an existing function are passed at the end of the other form. Not a must-fix. > Nodemanagers should try harder to connect to the RM > --- > > Key: YARN-4132 > URL: https://issues.apache.org/jira/browse/YARN-4132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4132.2.patch, YARN-4132.3.patch, YARN-4132.4.patch, > YARN-4132.patch > > > Being part of the cluster, nodemanagers should try very hard (and possibly > never give up) to connect to a resourcemanager. Minimally we should have a > separate config to set how aggressively a nodemanager will connect to the RM > separate from what clients will do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997121#comment-14997121 ] Varun Saxena commented on YARN-4053: Vrushali, thanks for your comments. I would like to work on this. Let me take a stab on this one. Will have the bandwidth. I hope its fine. You can help me with the reviews. Coming to the points, I agree that flag is not good for extensibility. As I said earlier, flag should be fine for now as we have only 2 choices(generic or long) and we can extend later. But eventually will have to have different handlers for different types. So why not do it now. Hence, lets go with proposal above. Moreover, yes, we need to have proper handling based on data type or conversion mechanism in FlowScanner too. As mentioned in an earlier comment, I was thinking we can indicate this in attributes. But I guess your proposal sounds better. We can identify the column/column prefix in flow scanner as well and convert based on the converter attached to it. bq. it missed one of the places in the current patch for example Which place ? MIN/MAX handling ? bq. For single value vs time series, we suggest using a column prefix to distinguish them Do we need to have a differentiation between SINGLE_VALUE and TIME_SERIES if by default it will be read as SINGLE_VALUE ? Because we will be storing multiple values even for metric of type SINGLE_VALUE. Do you mean on the read side, only the latest value of a metric is to be returned if its of type SINGLE_VALUE (even if client asks for TIME_SERIES) ? Again the assumption here is that client will always send the metric type(SINGLE_VALUE or TIME_SERIES) consistently. bq. For the read path, we can assume it is a single value unless specifically specified by the client as a time series (as clients would need to intend to read time series explicitly). We can return TIME_SERIES by indicating something like METRICS_TIME_SERIES as fields. If we do so, it will have implications on YARN-3862. Now the question is whether to return values for multiple timestamps even for metric type of SINGLE_VALUE if client asks for it ? What if client wants to see values of a gauge(which might be considered as a SINGLE_VALUE) over a period of time, for instance. If yes, do we need to even differentiate between the 2 types ? bq. We finally concluded that we should start with storing longs only and make the code strictly accept longs JAX-RS i.e. the REST API layer will convert an integral value to Integer automatically if its less than Integer.MAX_VALUE so I guess we will have to handle ints and shorts as well i.e. if its an Integer for instance, we can call Integer#longValue to convert it to long. bq. Regarding indicating whether to aggregate or not, we suggest to rely mostly on the flow run aggregation. For those use cases that need to access metrics off of tables other than the flow run table (e.g. time-based aggregation), we need to explore ways to specify this information as input (config, etc.) I hope Li Lu is fine with this because I remember him saying on YARN-3816 that he will be using it for offline aggregation in YARN-3817. I think rows from application table are being used in the MR job there. Are you suggesting that for offline aggregation, based on config, we aggregate all the application metrics(to flow or user) or nothing ? Or configure a set of metrics to aggregate in some config ? > Change the way metric values are stored in HBase Storage > > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch, > YARN-4053-YARN-2928.02.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997153#comment-14997153 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 7s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 6m 21s {color} | {color:red} root in YARN-2928 failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 2m 32s {color} | {color:red} hadoop-yarn-server-timelineservice in YARN-2928 failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-server-timelineservice in YARN-2928 failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-server-timelineservice in YARN-2928 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s {color} | {color:red} hadoop-yarn-server-timelineservice in YARN-2928 failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 9s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 9s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s {color} | {color:red} Patch generated 43 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice (total was 102, now 128). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 13s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 9s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 12s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.7.0_79. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 18s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 13m 24s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-09 | | JIRA Patch URL |
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996875#comment-14996875 ] Chang Li commented on YARN-2556: Thanks [~Naganarasimha] for suggesting optimization! +1 on the idea of creating some initial leveldb data before test the performance. Create YARN-4339 to work on this idea. > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, > YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, > YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, > YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, > YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, > YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996886#comment-14996886 ] Eric Payne commented on YARN-3769: -- bq. you don't need to do componmentwiseMax here, since minPendingAndPreemptable <= headroom, and you can use substractFrom to make code simpler. [~leftnoteasy], you are right, we do know that {{minPendingAndPreemptable <= headroom}}. Thanks for the catch. I will make those changes. > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch, > YARN-3769.003.patch, YARN-3769.004.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996687#comment-14996687 ] Jason Lowe commented on YARN-4051: -- If I understand this correctly, we're saying that the problem described in YARN-4050 is holding up the main event dispatcher and the NM is semi-hung, yet we want to hurry and register with the ResourceManager before containers have recovered? Seems to me we need to address the problem described in YARN-4050 if possible (e.g.: skip HDFS operations if we recovered at least one container in the running or completed states since we know it must have done HDFS init in the previous NM instance). Otherwise we are hacking around the fact that we registered too soon and aren't able to properly handle the out-of-order events. I'd much rather deal with the root cause if possible than patch all the separate symptoms. > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996832#comment-14996832 ] Chang Li commented on YARN-2556: Hi [~xgong], here is the usage print out by the tool {code} Usage: [-m ] number of mappers (default: 1) [-v] timeline service version [-mtype ] 1. simple entity write mapper 2. jobhistory files replay mapper [-s <(KBs)test>] number of KB per put (mtype=1, default: 1 KB) [-t] package sending iterations per mapper (mtype=1, default: 100) [-d ] root path of job history files (mtype=2) [-r ] (mtype=2) 1. write all entities for a job in one put (default) 2. write one entity at a time{code} there are two different modes to test, one is simple entity writer, where each mapper create your specified size of entities and put them to timeline server. The other mode of test is by replaying jobhistory files, which offer a more realistic test. In the case of jobhistory file replay test, you put testing jobhistory files(both the job history file and job conf file) under a directory, and then you specify the testing dir by -d option. You specify the test mode by -mtype option. Right now the usage won't get printed out if you pass no options, but only print out when you pass the wrong options. When you give no parameters, the test run with simple entity write mode and default setting. So maybe we want to print out this usage if we don't pass any parameter? > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, > YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, > YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, > YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, > YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, > YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997327#comment-14997327 ] Jonathan Eagles commented on YARN-4183: --- Here are the requirements that users at scale need, and unfortunately the config design does not allow for this properly. Let me draw up what the requirements in my mind should be based my current knowledge. This is by no means an edict, but just a conversation starting point, so you know where I'm coming from. # Jobs that make use of the timeline service, may have a hard or soft runtime on the timeline service -- Jobs that interact directly with the timeline service (TimelineClient) should obtain delegation token to use the service and optionally allow for non-fatal runtime dependency (job is allowed to run, but no history is written) -- Jobs that don't interact with the timeline service (EntityFileTimelineClient), should obtain HDFS delegation tokens, but should not obtain timeline service delegation tokens. # Jobs that don't make user of the timeline service, should have no runtime dependency on the timeline service and should be allowed freely to submit and run jobs if the regardless of the timeline service status. # YARN services that interact with the timeline server (Generic History Server), may have runtime dependency of the timeline service that does not disrupt job submission. The issue regarding this jira is that putting yarn.timeline-service.enabled in the client xml (breaks #2 above) forces every job (both MR (not using timeline service) and Tez (using timeline service)) to have a runtime dependency on the timeline service. This places an artificial runtime dependency on the timeline service which is not highly available or highly scalable until v2.0. The issue regarding putting the yarn.timeline-service.enabled in the resource manager (breaks #3 above) is that every YarnClientImpl (used in job status, used in job submission) now reaches out to get a delegation token token. This places the timeline service (neither highly scalable or highly available until v2.0) as a runtime dependency for job submission and get many unnecessary delegation token for YarnClients that never intent to use them. > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2047) RM should honor NM heartbeat expiry after RM restart
[ https://issues.apache.org/jira/browse/YARN-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997181#comment-14997181 ] Bikas Saha commented on YARN-2047: -- I think the general idea is that the AM cannot be trusted about allocated resources or running containers. > RM should honor NM heartbeat expiry after RM restart > > > Key: YARN-2047 > URL: https://issues.apache.org/jira/browse/YARN-2047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha > > After the RM restarts, it forgets about existing NM's (and their potentially > decommissioned status too). After restart, the RM cannot maintain the > contract to the AM's that a lost NM's containers will be marked finished > within the expiry time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"
[ https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997296#comment-14997296 ] Jason Lowe commented on YARN-4334: -- Thanks for the prototype, Chang! Ideally when attempting to recover from an old state we should still remember the apps but recover them in a completed state (either killed or failed). It looks like the prototype will cause the RM to completely forget everything which isn't ideal. WIthout recovering the state but yet leaving it in the state store then we risk a situation like the following: # RM restarts late, recovers nothing # RM updates the store timestamp # RM restarts # RM tries to recover all the old state left from the first instance that wasn't cleaned up in the second Was there a reason to use a raw thread and sleeps for the update rather than a Timer? In either case it needs to be a daemon thread. The recovery code should check the version first before doing anything else with the state store. The conf settings give no hints in their name nor any documentation as to what units to use. Is it millseconds? minutes? hours? Why a default of 1? "RMLivenessKey" should be a static final constant to avoid the chance of typos. The code has no check for the key missing a value -- db.get will return NULL if the Nit: a setting of zero should be equivalent to a -1 setting. It makes no sense to configure it so the store is always expired. > Ability to avoid ResourceManager recovery if state store is "too old" > - > > Key: YARN-4334 > URL: https://issues.apache.org/jira/browse/YARN-4334 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4334.wip.patch > > > There are times when a ResourceManager has been down long enough that > ApplicationMasters and potentially external client-side monitoring mechanisms > have given up completely. If the ResourceManager starts back up and tries to > recover we can get into situations where the RM launches new application > attempts for the AMs that gave up, but then the client _also_ launches > another instance of the app because it assumed everything was dead. > It would be nice if the RM could be optionally configured to avoid trying to > recover if the state store was "too old." The RM would come up without any > applications recovered, but we would avoid a double-submission situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997214#comment-14997214 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #647 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/647/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997275#comment-14997275 ] Sangjin Lee commented on YARN-2556: --- +1 with the proposal to add documentation. The command line help is useful, but it would be good to have a little more detail in the documentation. > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Chang Li > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, > YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, > YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, > YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, > YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, > YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997185#comment-14997185 ] Hudson commented on YARN-3840: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8780 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8780/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997267#comment-14997267 ] Sangjin Lee commented on YARN-4183: --- Sorry I missed this one as well. Maybe this is a FAQ somewhere, but what are the relationships among the following 3 settings? # yarn.timeline-service.enabled # yarn.timeline-service.generic-application-history.enabled # yarn.resourcemanager.system-metrics-publisher.enabled Can (1) and (2) be set independently, or does setting one have an implication on the other? How about (3)? >From the v.2 perspective, there is no separate "generic application history >service" any way, and we will have to handle this problem in a different >manner. > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4334) Ability to avoid ResourceManager recovery if state store is "too old"
[ https://issues.apache.org/jira/browse/YARN-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997296#comment-14997296 ] Jason Lowe edited comment on YARN-4334 at 11/9/15 8:31 PM: --- Thanks for the prototype, Chang! Ideally when attempting to recover from an old state we should still remember the apps but recover them in a completed state (either killed or failed). It looks like the prototype will cause the RM to completely forget everything which isn't ideal. WIthout recovering the state but yet leaving it in the state store then we risk a situation like the following: # RM restarts late, recovers nothing # RM updates the store timestamp # RM restarts # RM tries to recover all the old state left from the first instance that wasn't cleaned up in the second Was there a reason to use a raw thread and sleeps for the update rather than a Timer? In either case it needs to be a daemon thread. The recovery code should check the version first before doing anything else with the state store. The conf settings give no hints in their name nor any documentation as to what units to use. Is it millseconds? minutes? hours? Why a default of 1? "RMLivenessKey" should be a static final constant to avoid the chance of typos. The code has no check for the key missing a value -- db.get will return null if the key is missing. Nit: a setting of zero should be equivalent to a -1 setting. It makes no sense to configure it so the store is always expired. was (Author: jlowe): Thanks for the prototype, Chang! Ideally when attempting to recover from an old state we should still remember the apps but recover them in a completed state (either killed or failed). It looks like the prototype will cause the RM to completely forget everything which isn't ideal. WIthout recovering the state but yet leaving it in the state store then we risk a situation like the following: # RM restarts late, recovers nothing # RM updates the store timestamp # RM restarts # RM tries to recover all the old state left from the first instance that wasn't cleaned up in the second Was there a reason to use a raw thread and sleeps for the update rather than a Timer? In either case it needs to be a daemon thread. The recovery code should check the version first before doing anything else with the state store. The conf settings give no hints in their name nor any documentation as to what units to use. Is it millseconds? minutes? hours? Why a default of 1? "RMLivenessKey" should be a static final constant to avoid the chance of typos. The code has no check for the key missing a value -- db.get will return NULL if the Nit: a setting of zero should be equivalent to a -1 setting. It makes no sense to configure it so the store is always expired. > Ability to avoid ResourceManager recovery if state store is "too old" > - > > Key: YARN-4334 > URL: https://issues.apache.org/jira/browse/YARN-4334 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4334.wip.patch > > > There are times when a ResourceManager has been down long enough that > ApplicationMasters and potentially external client-side monitoring mechanisms > have given up completely. If the ResourceManager starts back up and tries to > recover we can get into situations where the RM launches new application > attempts for the AMs that gave up, but then the client _also_ launches > another instance of the app because it assumed everything was dead. > It would be nice if the RM could be optionally configured to avoid trying to > recover if the state store was "too old." The RM would come up without any > applications recovered, but we would avoid a double-submission situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-4287: - Attachment: YARN-4287-minimal-v3.patch Noticed simple spelling error > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4234) New put APIs in TimelineClient for ats v1.5
[ https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4234: Attachment: YARN-4234.20151109.patch > New put APIs in TimelineClient for ats v1.5 > --- > > Key: YARN-4234 > URL: https://issues.apache.org/jira/browse/YARN-4234 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4234.1.patch, YARN-4234.2.patch, > YARN-4234.20151109.patch, YARN-4234.3.patch > > > In this ticket, we will add new put APIs in timelineClient to let > clients/applications have the option to use ATS v1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997899#comment-14997899 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #589 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/589/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4339) optimize timeline server performance tool
[ https://issues.apache.org/jira/browse/YARN-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997542#comment-14997542 ] Naganarasimha G R commented on YARN-4339: - Thanks for raising this issue [~lichangleo], I would like following along with it * It should be configurable whether to enable or disable populating the data (As it doesn't have any impact on ATS V2 and not sure about ATSv1.5) * Amount of data to be populated (number and size) can also be captured. > optimize timeline server performance tool > - > > Key: YARN-4339 > URL: https://issues.apache.org/jira/browse/YARN-4339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > > As [~Naganarasimha] suggest in YARN-2556 that test could be optimized by > having some initial Level DB data before testing the performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997648#comment-14997648 ] Jason Lowe edited comment on YARN-4311 at 11/9/15 11:31 PM: bq. Could these be part of shutdown nodes or do we need a separate category for such nodes? Would just the count of such nodes suffice or do we want to view them while its within the grace period? The intent is the list of nodes would be visible from the UI for some period of time, so users can see where a particular node went after the update. I think these nodes could be part of the shutdown category since they were told to shutdown and leave the cluster. was (Author: jlowe): bq. Could these be part of shutdown nodes or do we need a separate category for such nodes? Would just the count of such nodes suffice or do we want to view them while its within the grace period? The intent is the list of nodes would be visible from the UI for some period of time, so they can see where a particular node went after the update. I think they could be part of the shutdown category since they were told to shutdown and leave the cluster. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997655#comment-14997655 ] Kuhu Shukla commented on YARN-4311: --- Thanks [~jlowe]. Will rework my patch accordingly. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997654#comment-14997654 ] Wangda Tan commented on YARN-4338: -- Thanks for comments: [~sunilg]/[~Naganarasimha]. [~xinwei], I would prefer to keep main logic as-is and fix tests, the major concern is people may think node label expression required to check null in CS logic, which could reduce code readability. I'm OK with common code (such as AppSchedulingInfo) check null for nodeLabelExpression. Could you fix tests of YARN-2618 instead of updating RegularContainerAllocator? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997648#comment-14997648 ] Jason Lowe commented on YARN-4311: -- bq. Could these be part of shutdown nodes or do we need a separate category for such nodes? Would just the count of such nodes suffice or do we want to view them while its within the grace period? The intent is the list of nodes would be visible from the UI for some period of time, so they can see where a particular node went after the update. I think they could be part of the shutdown category since they were told to shutdown and leave the cluster. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997650#comment-14997650 ] Hadoop QA commented on YARN-4287: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 198, now 202). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 16s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_79 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-09 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12771428/YARN-4287-minimal-v3.patch | | JIRA Issue | YARN-4287 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux 3182d018451a
[jira] [Created] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
Xinwei Qin created YARN-4338: - Summary: NPE in RegularContainerAllocator.preCheckForNewContainer() Key: YARN-4338 URL: https://issues.apache.org/jira/browse/YARN-4338 Project: Hadoop YARN Issue Type: Bug Reporter: Xinwei Qin Priority: Minor The codes in RegularContainerAllocator.preCheckForNewContainer(): {code} if (anyRequest.getNodeLabelExpression() .equals(RMNodeLabelsManager.NO_LABEL)) { missedNonPartitionedRequestSchedulingOpportunity = application .addMissedNonPartitionedRequestSchedulingOpportunity(priority); } {code} {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996185#comment-14996185 ] Sunil G commented on YARN-4338: --- Recently in YARN-4250, there were a chance that {{anyRequest.getNodeLabelExpression()}} become null becaue ApplicationMasterService may not normalizes expression always. > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996191#comment-14996191 ] Naganarasimha G R commented on YARN-4338: - Hi [~xinwei], What was the scenario in which you got this NPE? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996241#comment-14996241 ] Naganarasimha G R commented on YARN-4338: - Hi [~xinwei], In that case its expected to come through ApplicationMasterService , so may be its sufficient to rectify the test case with default label "" . [~sunilg] & [~wangda] But as we are coming across this more frequently, how about correcting it with setting with Default Label when using other overloaded methods or even in the main overloaded method we can check for null and set to Default i.e. "" ? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996336#comment-14996336 ] Varun Saxena commented on YARN-2934: Thanks [~Naganarasimha] for uploading the patch. Sorry could not do a thorough review earlier. Had a cursory glance at the latest patch. A few quick comments. * In the code below, *instead of compiling the pattern again and again*, we can compile it once and store it in a static variable(because its taken from config and hence wont change). Pattern#compile incurs a performance overhead if called again and again. {code} String errorFileNameRegexPattern = conf.get(YarnConfiguration.NM_CONTAINER_ERROR_FILE_NAME_PATTERN, YarnConfiguration.DEFAULT_NM_CONTAINER_ERROR_FILE_NAME_PATTERN); Pattern pattern = null; try { pattern = Pattern.compile(errorFileNameRegexPattern, Pattern.CASE_INSENSITIVE); } catch (PatternSyntaxException e) { pattern = Pattern.compile( YarnConfiguration.DEFAULT_NM_CONTAINER_ERROR_FILE_NAME_PATTERN, Pattern.CASE_INSENSITIVE); } {code} * Also IMO, atleast a warning log should be printed if configured pattern cannot compile. This can alert the user about wrong configuration. Should we consider not starting up NM in this case(if config is wrong) ? Maybe its not that important a config to not start NM. An alert message should be enough. * Moreover, you can also consider using Configuration#getPattern, but take care of using it only once. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996335#comment-14996335 ] Naganarasimha G R commented on YARN-2934: - Findbugs is not related to this jira and check style & white space issue can be corrected as part of the next patch, Waiting for review comments ! cc/ [~jira.shegalov] & [~bikassaha]. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4050) NM event dispatcher may blocked by LogAggregationService if NameNode is slow
[ https://issues.apache.org/jira/browse/YARN-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4050: --- Assignee: (was: sandflee) > NM event dispatcher may blocked by LogAggregationService if NameNode is slow > > > Key: YARN-4050 > URL: https://issues.apache.org/jira/browse/YARN-4050 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee > > env: nm restart and log aggregation is enabled. > NN is almost dead, when we restart NM, NM event dispatcher is blocked until > NN returns to normal.It seems. NM recovered app and send APPLICATION_START > event to log aggregation service, it will check log dir permission in > HDFS(BLOCKED) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996338#comment-14996338 ] Naganarasimha G R commented on YARN-4338: - meant => ??how about correcting it in *ResourceRequest* with setting Default Label when using other overloaded methods or in the main overloaded method we can check for null and set to Default label i.e. ""?? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996345#comment-14996345 ] sandflee commented on YARN-4051: Is it possible for the finish application or complete container requests to arrive at this point? yes, we see this in YARN-4050. If we register to RM after complete container recover, we must face the risk that the container running on this node will be killed if container recovery takes much more time(in YARN-4050), for long-runing-services, maybe not so perfect. > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997663#comment-14997663 ] Wangda Tan commented on YARN-4338: -- I don't know what's the impact of it, since we're leveraging nodeLabelExpression=="null" to represent "unset" in ResourceRequest. I think some code path will fail if ResourceRequest.getNodeLabelExpression returns "" if it == null/ > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4287) Capacity Scheduler: Rack Locality improvement
[ https://issues.apache.org/jira/browse/YARN-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997573#comment-14997573 ] Wangda Tan commented on YARN-4287: -- Thanks for update, [~nroberts]. Patch generally looks good, few comments: - Could you add a comment at {code} return (Math.min(rmContext.getScheduler().getNumClusterNodes(), (requiredContainers * localityWaitFactor)) < missedOpportunities); {code} People read the code can get better understanding that why missedOpportunity need to be capped by numClusterNodes - I would suggest to add tests for missedOpportunity capped by numClusterNodes and resetSchedulingOpportunity for rack request. > Capacity Scheduler: Rack Locality improvement > - > > Key: YARN-4287 > URL: https://issues.apache.org/jira/browse/YARN-4287 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-4287-minimal-v2.patch, YARN-4287-minimal-v3.patch, > YARN-4287-minimal.patch, YARN-4287-v2.patch, YARN-4287-v3.patch, > YARN-4287-v4.patch, YARN-4287.patch > > > YARN-4189 does an excellent job describing the issues with the current delay > scheduling algorithms within the capacity scheduler. The design proposal also > seems like a good direction. > This jira proposes a simple interim solution to the key issue we've been > experiencing on a regular basis: > - rackLocal assignments trickle out due to nodeLocalityDelay. This can have > significant impact on things like CombineFileInputFormat which targets very > specific nodes in its split calculations. > I'm not sure when YARN-4189 will become reality so I thought a simple interim > patch might make sense. The basic idea is simple: > 1) Separate delays for rackLocal, and OffSwitch (today there is only 1) > 2) When we're getting rackLocal assignments, subsequent rackLocal assignments > should not be delayed > Patch will be uploaded shortly. No big deal if the consensus is to go > straight to YARN-4189. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997652#comment-14997652 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2588 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2588/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 8fbea531d7f7b665f6f55af54c8ebf330118ff37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-plugin-1.10.7/sorting/natural.js.gz * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/CHANGES.txt > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Mohammad Shahid Khan > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, > YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, > yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997659#comment-14997659 ] Naganarasimha G R commented on YARN-4338: - Hi [~wangda], How about setting Default Label in ResourceRequest when not set ? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997734#comment-14997734 ] Wangda Tan commented on YARN-3946: -- 1) Is it possible to merge amLaunchDiagnostics and other diagnostics? Which can simplify RMAppAttemptImpl implementation. 2) Could you take a look at my previous comment? bq. Since RMAppAttempt and SchedulerApplicationAttempt has 1 to 1 relationship, we can save a reference to RMAppAttemt in SchedulerApplicationAttempt, which could avoid getting it from RMContext.getRMApps()... 3) I feel this may not needed (no code change needed for you latest patch) bq. Since String is immutable, amLaunchDiagnostics could be violate so we don't need acquire locks. Since currently createApplicationAttemptReport has a big readLock, we don't need to spend extra time for the volatile. 4) Suggestions about diagnostic message: - Have an internal field to record when is the latest update for the app. We can print it with diagnostic message to say, {{\[23 sec before\] }}. - And we can use above field to prevent excessive updating of diagnostic message, currently it will be updated for every heartbeat for every accessed applications. I think we should limit frequency of updating to avoid overheads, hardcoding it to 1 sec seems fine to me for now, we can make it configurable if people starting complain it :) - Generally, I think the message format could be: {{Last update from scheduler: (such as 23 sec before); (such as "Application is activated, waiting for allocating AM container"); Details: (instead of GenericInfo) Partition=x, queue's absoluate capacity ... (and other fields in your patch)}} - After AM container is allocated and running, above message is still useful because people could understand if application is actively allocating resource or stay in the queue waiting to be accessed. > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996236#comment-14996236 ] Hadoop QA commented on YARN-2934: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 6s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 16s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 277, now 278). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 9 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 51s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 30s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 5s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black}
[jira] [Updated] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated YARN-4338: -- Attachment: YARN-4338.001.patch Simple fix is to check if the value is null. > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996194#comment-14996194 ] Sunil G commented on YARN-4338: --- Missed to add a point earlier, are you using a custom scheduler here? > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4338) NPE in RegularContainerAllocator.preCheckForNewContainer()
[ https://issues.apache.org/jira/browse/YARN-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996246#comment-14996246 ] Xinwei Qin commented on YARN-4338: --- Thanks [~Naganarasimha] for your suggestion, the test case passed with this modification. > NPE in RegularContainerAllocator.preCheckForNewContainer() > -- > > Key: YARN-4338 > URL: https://issues.apache.org/jira/browse/YARN-4338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xinwei Qin >Priority: Minor > Attachments: YARN-4338.001.patch > > > The codes in RegularContainerAllocator.preCheckForNewContainer(): > {code} > if (anyRequest.getNodeLabelExpression() > .equals(RMNodeLabelsManager.NO_LABEL)) { > missedNonPartitionedRequestSchedulingOpportunity = > application > .addMissedNonPartitionedRequestSchedulingOpportunity(priority); > } > {code} > {code}anyRequest.getNodeLabelExpression(){code}may return null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2934: Attachment: YARN-2934.v1.005.patch Thanks for the comments [~varun_saxena], bq. Pattern#compile incurs a performance overhead if called again and again. thought of handling this based on further comments and also it was not in critical/repititive code flow but any way worth optimizing hence have done in this patch. bq. Should we consider not starting up NM in this case(if config is wrong) ? Maybe its not that important a config to not start NM. An alert message should be enough. As discussed alert is enough as its not critical. bq. Moreover, you can also consider using Configuration#getPattern, but take care of using it only once. Yep this would be usefull, and also takes care of your 2nd comment, hence using this. but adding one more additional method there to ignore the case. > Improve handling of container's stderr > --- > > Key: YARN-2934 > URL: https://issues.apache.org/jira/browse/YARN-2934 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Gera Shegalov >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, > YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch > > > Most YARN applications redirect stderr to some file. That's why when > container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)