[jira] [Created] (YARN-1733) Intermittent failed for TestRMWebServicesApps

2014-02-17 Thread Junping Du (JIRA)
Junping Du created YARN-1733:


 Summary: Intermittent failed for TestRMWebServicesApps
 Key: YARN-1733
 URL: https://issues.apache.org/jira/browse/YARN-1733
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Junping Du


In some Jenkins tests (like: YARN-1506, YARN-1641),
TestRMWebServicesApps get failed with log as: 

java.lang.AssertionError: incorrect number of elements expected:20 but 
was:18
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.verifyAppInfo(TestRMWebServicesApps.java:1321)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleAppsHelper(TestRMWebServicesApps.java:1261)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.testSingleApp(TestRMWebServicesApps.java:1153)




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2014-02-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903297#comment-13903297
 ] 

Jason Lowe commented on YARN-221:
-

Personally I think the AM racing to kill tasks that have indicated they are 
done is a bug.  It causes all sorts of problems:

- Occasional Container killed by ApplicationMaster messages on otherwise 
normal tasks confuses users into thinking something went wrong for some of 
their tasks
- Trying to take a java profile for a task can fail if the profile dump takes 
too long or the kill arrives too quickly (see MAPREDUCE-5465)
- Killing a task that should otherwise be exiting on its own creates a constant 
race-condition scenario that has caused problems in other similar setups (see 
MAPREDUCE-4157 for a similar situation where the RM was killing AMs too early 
and causing problems).

I think we should fix these races by implementing a reasonable delay between a 
task reporting a terminal state and a kill being issued by the AM.  That allows 
the task to complete on its own with an appropriate exit code, eliminating the 
need to specify log states on stop as a workaround.

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Robert Joseph Evans
Assignee: Chris Trezzo
 Attachments: YARN-221-trunk-v1.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1732) Change types of related entities and primary filters in ATSEntity

2014-02-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1732:
--

Attachment: YARN-1732.3.patch

Thanks for the patch, Billie! It looks good to me overall. I uploaded a new 
patch based on yours, but fixed some javadoc, and cleaned the unnecessary 
import.

 Change types of related entities and primary filters in ATSEntity
 -

 Key: YARN-1732
 URL: https://issues.apache.org/jira/browse/YARN-1732
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1732.1.patch, YARN-1732.2.patch, YARN-1732.3.patch


 The current types MapString, ListString relatedEntities and MapString, 
 Object primaryFilters have issues.  The ListString value of the related 
 entities map could have multiple identical strings in it, which doesn't make 
 sense. A more major issue is that we cannot allow primary filter values to be 
 overwritten, because otherwise we will be unable to find those primary filter 
 entries when we want to delete an entity (without doing a nearly full scan).
 I propose changing related entities to MapString, SetString and primary 
 filters to MapString, SetObject.  The basic methods to add primary 
 filters and related entities are of the form add(key, value) and will not 
 need to change.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-02-17 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903589#comment-13903589
 ] 

Patrick Wendell commented on YARN-1530:
---

Hey,

Thanks for the explanation! To make sure I understand how this would all work 
by walking through an example.

For the Spark UI we are currently implementing the ability to serialize and 
write events to HDFS, then load them later from a history server that can 
render the UI for jobs that are finished. AFAIK this is basically how MapReduce 
works as well (?)

If users have set-up a YARN cluster and they set up event ingestion to this 
shared store. Then Spark would need two things to integrate with it:

1. Be able to represent our events in JSON and hook into whatever source the 
user has set up for ingestion (flume, HDFS, etc).
2. Be able to render our history timeline UI by reading event data from this 
store.

Correct?

The benefit would be that if users set something fancy like flume, they could 
leverage the same infrastructure for Spark as for other applications since 
there is a shared event model. Also, they would benefit from faster indexed 
serving offered by this application when rendering the history UI... 

Is that the main idea? I'm just trying to figure out what redundant work is 
saved by having a generic framework. Since each application writes their own UI 
and has their own event model. From what I can tell the benefit is that a 
shared ingestion and serving infrastructure can be used. 

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: application timeline design-20140108.pdf, application 
 timeline design-20140116.pdf, application timeline design-20140130.pdf, 
 application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1732) Change types of related entities and primary filters in ATSEntity

2014-02-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903594#comment-13903594
 ] 

Hadoop QA commented on YARN-1732:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629433/YARN-1732.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3114//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3114//console

This message is automatically generated.

 Change types of related entities and primary filters in ATSEntity
 -

 Key: YARN-1732
 URL: https://issues.apache.org/jira/browse/YARN-1732
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1732.1.patch, YARN-1732.2.patch, YARN-1732.3.patch


 The current types MapString, ListString relatedEntities and MapString, 
 Object primaryFilters have issues.  The ListString value of the related 
 entities map could have multiple identical strings in it, which doesn't make 
 sense. A more major issue is that we cannot allow primary filter values to be 
 overwritten, because otherwise we will be unable to find those primary filter 
 entries when we want to delete an entity (without doing a nearly full scan).
 I propose changing related entities to MapString, SetString and primary 
 filters to MapString, SetObject.  The basic methods to add primary 
 filters and related entities are of the form add(key, value) and will not 
 need to change.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state

2014-02-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1428:
--

Attachment: YARN-1428.3.patch

Thank for your review, Jian! I uploaded a new patch with refactoring of 
TestRMAppTransitions and TestRMAppAttemptTransitions.

 RM cannot write the final state of RMApp/RMAppAttempt to the application 
 history store in the transition to the final state
 ---

 Key: YARN-1428
 URL: https://issues.apache.org/jira/browse/YARN-1428
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3.patch


 ApplicationFinishData and ApplicationAttemptFinishData are written in the 
 final transitions of RMApp/RMAppAttempt respectively. However, in the 
 transitions, getState() is not getting the state that RMApp/RMAppAttempt is 
 going to enter, but prior one.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-02-17 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903621#comment-13903621
 ] 

Zhijie Shen commented on YARN-1530:
---

bq. 1. Be able to represent our events in JSON and hook into whatever source 
the user has set up for ingestion (flume, HDFS, etc).

Currently, you can either compose your JSON content and send HTTP request 
yourself, or make use of ATSClient and ATS POJO classes (the names may be 
refactored). The latter way should be comparably easier. If the communication 
media is changed in the future, we're going to provide relevant user lib to 
publish data easily.

bq. Is that the main idea?

Yes, I think so. We want to relieve developers from building a history server 
from nothing. Let YARN to provide the infrastructure for generic applications, 
and each application focuses on the its individual logic, such as the data 
model, and how to render the data.

 

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: application timeline design-20140108.pdf, application 
 timeline design-20140116.pdf, application timeline design-20140130.pdf, 
 application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1428) RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state

2014-02-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903637#comment-13903637
 ] 

Hadoop QA commented on YARN-1428:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12629441/YARN-1428.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3115//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3115//console

This message is automatically generated.

 RM cannot write the final state of RMApp/RMAppAttempt to the application 
 history store in the transition to the final state
 ---

 Key: YARN-1428
 URL: https://issues.apache.org/jira/browse/YARN-1428
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1428.1.patch, YARN-1428.2.patch, YARN-1428.3.patch


 ApplicationFinishData and ApplicationAttemptFinishData are written in the 
 final transitions of RMApp/RMAppAttempt respectively. However, in the 
 transitions, getState() is not getting the state that RMApp/RMAppAttempt is 
 going to enter, but prior one.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on

2014-02-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903699#comment-13903699
 ] 

Junping Du commented on YARN-1724:
--

I would prefer more to remove sort rather than adding a lock. No matter sort or 
not, it will iterate all nodes and see if can do attemptScheduling() which make 
sort sounds unnecessary. It only make sense when we skip some nodes with less 
resources but additional lock may be needed. Thoughts?

 Race condition in Fair Scheduler when continuous scheduling is turned on 
 -

 Key: YARN-1724
 URL: https://issues.apache.org/jira/browse/YARN-1724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1724-1.patch, YARN-1724.patch


 If nodes resource allocations change during
 Collections.sort(nodeIdList, nodeAvailableResourceComparator);
 we'll hit:
 java.lang.IllegalArgumentException: Comparison method violates its general 
 contract!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1724) Race condition in Fair Scheduler when continuous scheduling is turned on

2014-02-17 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903769#comment-13903769
 ] 

Karthik Kambatla commented on YARN-1724:


Sorting here helps with balancing load better between nodes. Given not all 
containers run for the same duration, round-robin alone wouldn't lead to 
balanced load. 

 Race condition in Fair Scheduler when continuous scheduling is turned on 
 -

 Key: YARN-1724
 URL: https://issues.apache.org/jira/browse/YARN-1724
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1724-1.patch, YARN-1724.patch


 If nodes resource allocations change during
 Collections.sort(nodeIdList, nodeAvailableResourceComparator);
 we'll hit:
 java.lang.IllegalArgumentException: Comparison method violates its general 
 contract!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)