[jira] [Created] (YARN-1733) Intermittent failed for TestRMWebServicesApps
Junping Du created YARN-1733: Summary: Intermittent failed for TestRMWebServicesApps Key: YARN-1733 URL: https://issues.apache.org/jira/browse/YARN-1733 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Junping Du In some Jenkins tests (like: YARN-1506, YARN-1641), TestRMWebServicesApps get failed with log as: java.lang.AssertionError: incorrect number of elements expected:20 but was:18 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.verifyAppInfo(TestRMWebServicesApps.java:1321) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleAppsHelper(TestRMWebServicesApps.java:1261) at org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleApp(TestRMWebServicesApps.java:1153) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903297#comment-13903297 ] Jason Lowe commented on YARN-221: - Personally I think the AM racing to kill tasks that have indicated they are done is a bug. It causes all sorts of problems: - Occasional Container killed by ApplicationMaster messages on otherwise normal tasks confuses users into thinking something went wrong for some of their tasks - Trying to take a java profile for a task can fail if the profile dump takes too long or the kill arrives too quickly (see MAPREDUCE-5465) - Killing a task that should otherwise be exiting on its own creates a constant race-condition scenario that has caused problems in other similar setups (see MAPREDUCE-4157 for a similar situation where the RM was killing AMs too early and causing problems). I think we should fix these races by implementing a reasonable delay between a task reporting a terminal state and a kill being issued by the AM. That allows the task to complete on its own with an appropriate exit code, eliminating the need to specify log states on stop as a workaround. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Robert Joseph Evans Assignee: Chris Trezzo Attachments: YARN-221-trunk-v1.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1732) Change types of related entities and primary filters in ATSEntity
[ https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1732: -- Attachment: YARN-1732.3.patch Thanks for the patch, Billie! It looks good to me overall. I uploaded a new patch based on yours, but fixed some javadoc, and cleaned the unnecessary import. Change types of related entities and primary filters in ATSEntity - Key: YARN-1732 URL: https://issues.apache.org/jira/browse/YARN-1732 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1732.1.patch, YARN-1732.2.patch, YARN-1732.3.patch The current types MapString, ListString relatedEntities and MapString, Object primaryFilters have issues. The ListString value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to MapString, SetString and primary filters to MapString, SetObject. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903589#comment-13903589 ] Patrick Wendell commented on YARN-1530: --- Hey, Thanks for the explanation! To make sure I understand how this would all work by walking through an example. For the Spark UI we are currently implementing the ability to serialize and write events to HDFS, then load them later from a history server that can render the UI for jobs that are finished. AFAIK this is basically how MapReduce works as well (?) If users have set-up a YARN cluster and they set up event ingestion to this shared store. Then Spark would need two things to integrate with it: 1. Be able to represent our events in JSON and hook into whatever source the user has set up for ingestion (flume, HDFS, etc). 2. Be able to render our history timeline UI by reading event data from this store. Correct? The benefit would be that if users set something fancy like flume, they could leverage the same infrastructure for Spark as for other applications since there is a shared event model. Also, they would benefit from faster indexed serving offered by this application when rendering the history UI... Is that the main idea? I'm just trying to figure out what redundant work is saved by having a generic framework. Since each application writes their own UI and has their own event model. From what I can tell the benefit is that a shared ingestion and serving infrastructure can be used. [Umbrella] Store, manage and serve per-framework application-timeline data -- Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: application timeline design-20140108.pdf, application timeline design-20140116.pdf, application timeline design-20140130.pdf, application timeline design-20140210.pdf This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1732) Change types of related entities and primary filters in ATSEntity
[ https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903594#comment-13903594 ] Hadoop QA commented on YARN-1732: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629433/YARN-1732.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3114//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3114//console This message is automatically generated. Change types of related entities and primary filters in ATSEntity - Key: YARN-1732 URL: https://issues.apache.org/jira/browse/YARN-1732 Project: Hadoop YARN Issue Type: Sub-task Reporter: Billie Rinaldi Assignee: Billie Rinaldi Attachments: YARN-1732.1.patch, YARN-1732.2.patch, YARN-1732.3.patch The current types MapString, ListString relatedEntities and MapString, Object primaryFilters have issues. The ListString value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to MapString, SetString and primary filters to MapString, SetObject. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1428: -- Attachment: YARN-1428.3.patch Thank for your review, Jian! I uploaded a new patch with refactoring of TestRMAppTransitions and TestRMAppAttemptTransitions. RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903621#comment-13903621 ] Zhijie Shen commented on YARN-1530: --- bq. 1. Be able to represent our events in JSON and hook into whatever source the user has set up for ingestion (flume, HDFS, etc). Currently, you can either compose your JSON content and send HTTP request yourself, or make use of ATSClient and ATS POJO classes (the names may be refactored). The latter way should be comparably easier. If the communication media is changed in the future, we're going to provide relevant user lib to publish data easily. bq. Is that the main idea? Yes, I think so. We want to relieve developers from building a history server from nothing. Let YARN to provide the infrastructure for generic applications, and each application focuses on the its individual logic, such as the data model, and how to render the data. [Umbrella] Store, manage and serve per-framework application-timeline data -- Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: application timeline design-20140108.pdf, application timeline design-20140116.pdf, application timeline design-20140130.pdf, application timeline design-20140210.pdf This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state
[ https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903637#comment-13903637 ] Hadoop QA commented on YARN-1428: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629441/YARN-1428.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3115//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3115//console This message is automatically generated. RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state --- Key: YARN-1428 URL: https://issues.apache.org/jira/browse/YARN-1428 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3.patch ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on
[ https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903699#comment-13903699 ] Junping Du commented on YARN-1724: -- I would prefer more to remove sort rather than adding a lock. No matter sort or not, it will iterate all nodes and see if can do attemptScheduling() which make sort sounds unnecessary. It only make sense when we skip some nodes with less resources but additional lock may be needed. Thoughts? Race condition in Fair Scheduler when continuous scheduling is turned on - Key: YARN-1724 URL: https://issues.apache.org/jira/browse/YARN-1724 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1724-1.patch, YARN-1724.patch If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on
[ https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903769#comment-13903769 ] Karthik Kambatla commented on YARN-1724: Sorting here helps with balancing load better between nodes. Given not all containers run for the same duration, round-robin alone wouldn't lead to balanced load. Race condition in Fair Scheduler when continuous scheduling is turned on - Key: YARN-1724 URL: https://issues.apache.org/jira/browse/YARN-1724 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1724-1.patch, YARN-1724.patch If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract! -- This message was sent by Atlassian JIRA (v6.1.5#6160)