[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2015-09-17 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791625#comment-14791625
 ] 

Vrushali C commented on YARN-4062:
--

No, we do use it in ColumnHelper#getPutTimestamp to be able to get the last 3 
digits to append to the timestamp. 

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4174) Fix javadoc warnings floating up from hbase

2015-09-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-4174:
-

Assignee: Sangjin Lee

> Fix javadoc warnings floating up from hbase 
> 
>
> Key: YARN-4174
> URL: https://issues.apache.org/jira/browse/YARN-4174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Sangjin Lee
>Priority: Minor
>
> As part of the patch for YARN-3901, [~sjlee0]  observed some (~200) javadoc 
> warnings that are coming from hbase classes. 
> We tried a bunch of things like making the FlowRunCoprocessor class non 
> public and excluding the package from the pom. If the class in made non 
> public, the table creation has an exception.
> {code}
> 206 warnings
> [WARNING] Javadoc Warnings
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestWALObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category': class 
> file for org.junit.experimental.categories.Category not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test': class 
> file for org.junit.Test not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverScannerOpenHook.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> 

[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3901:
-
Attachment: YARN-3901-YARN-2928.10.patch

Thanks [~sjlee0] for the javadoc warnings fix. I have included it now.

Also made the modifications for javadocs as suggested. 

Attaching patch v10

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: (was: YARN-4000.03.patch)

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791698#comment-14791698
 ] 

Hudson commented on YARN-4034:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8469 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8469/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791660#comment-14791660
 ] 

Hadoop QA commented on YARN-1897:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:red}-1{color} | javac |   7m 57s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 13s | The applied patch generated  2 
new checkstyle issues (total was 32, now 34). |
| {color:green}+1{color} | whitespace |   2m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   8m 43s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests | 100m  9s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  5s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  6s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   8m 11s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  54m 47s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 232m 30s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.mapred.TestNetworkedJob |
|   | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756403/YARN-1897-6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0832b38 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9179/console |


This message was automatically generated.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> 

[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791704#comment-14791704
 ] 

Naganarasimha G R commented on YARN-4176:
-

Hi [~bibinchundatt], 
Seems like this would be a better idea than what we did in YARN-4106, where in 
we used time interval of 1 min only on NM side failure. But i have few 
concerns/queries :
# I would suggest only to have only 1 resync configuration and remove what we 
introduced for YARN-4106.
# So node labels will be sent to RM either if node labels are modified since 
the last heartbeat or the resync-interval has elapsed right ?
# Earlier the way used to check the time elapse is using 
{{System.currentTimeMillis()}}, but i think we need to use the approach as 
mentioned by [~ste...@apache.org] in the 
[comment|https://issues.apache.org/jira/browse/HADOOP-12409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745056#comment-14745056]
 of the jira HADOOP-12409. Hope [~xinxianyin] creates a new jira and provide a 
clock with monotonic time by then, if not use  {{System.nanoTime()}}.


> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791716#comment-14791716
 ] 

Hudson commented on YARN-4034:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #400 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/400/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4177) yarn.util.Clock should not be used to time a duration or time interval

2015-09-17 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-4177:
-

 Summary: yarn.util.Clock should not be used to time a duration or 
time interval
 Key: YARN-4177
 URL: https://issues.apache.org/jira/browse/YARN-4177
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xianyin Xin


There're many places uses Clock to time intervals, which is dangerous as 
commented by [~ste...@apache.org] in HADOOP-12409. Instead, we should use 
hadoop.util.Timer#monotonicNow() to get monotonic time. Or we could provide a 
MonotonicClock in yarn.util considering the consistency of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791775#comment-14791775
 ] 

Varun Saxena commented on YARN-3816:


[~djp],
{quote}
In TimelineMetric#accumulateTo, can latestMetric be TIME_SERIES ? If not(seems 
to be the case as per current code), is the else part of the condition if 
(latestMetric.getType().equals(Type.SINGLE_VALUE)) { required ? Wont be 
handling TIME_SERIES then ?
I am not sure if I understand your comments correctly. But it definitely 
support TIME_SERIES type for latestMetrics and handle two types separately.
{quote}
Actually I should have worded my query differently. accumulateTo by itself can 
handle TIME_SERIES. This is more from the context of caller. I am not sure if 
Li's patch is calling it but in TimelineCollector#aggregateMetrics, we have 
code like below. Here, I see latestTimelineMetrics.retrieveSingleDataValue() 
being called, which will throw an exception if metric type is not SINGLE_VALUE. 
Objective of throwing exception here ? As we have to get a single value for 
delta calculations, for TIME_SERIES maybe we can take value the latest 
timestamp value. 
I was getting confused by this code(calling method which throws exception for 
time series). So was wondering if we wont be handling time series.
{code}
213 TimelineMetric latestTimelineMetrics = 
entityIdMap.get(entityId);
214 
215 Number delta = null;
216 // new added metric for specific entityId
217 if (latestTimelineMetrics == null) {
218   delta = metric.retrieveSingleDataValue();
219 } else {
220   delta = TimelineMetricCalculator.sub(
221   metric.retrieveSingleDataValue(),
222   latestTimelineMetrics.retrieveSingleDataValue());
223 }
...
250   TimelineMetric newAggregatedArea = metric.accumulateTo(
251   oldAggregatedArea, latestTimelineMetrics, 
aggregatedTime,
252   TimelineMetric.Operation.SUM);
{code}

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: YARN-4000.03.patch

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: YARN-4000.03.patch

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: YARN-4000.03.patch

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: (was: YARN-4000.03.patch)

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791626#comment-14791626
 ] 

Vrushali C commented on YARN-3901:
--

Thanks [~jrottinghuis], it is quite exciting to work on hbase cell tags and 
coprocessors. Looking forward to the upcoming jiras! 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791793#comment-14791793
 ] 

Hudson commented on YARN-4034:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1141 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1141/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-17 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791827#comment-14791827
 ] 

Bibin A Chundatt commented on YARN-4176:


Hi [~Naganarasimha]
Thnks for the comments.

{quote}
I would suggest only to have only 1 resync configuration and remove what we 
introduced for YARN-4106.
{quote}
Will be taken care 
{quote}
So node labels will be sent to RM either if node labels are modified since the 
last heartbeat or the resync-interval has elapsed right ?
{quote}
Currently on hearbeat we are sending only when update happens.  Will be *or* of 
update & resync.
Changing to {{System.nanoTime()}} will handle too.

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802698#comment-14802698
 ] 

Jian He commented on YARN-4000:
---

-  is this if condition a typo ? 
{code}
if (event.getDiagnosticMsg().isEmpty())

app.appDiagnosticsBeforeKilling =
event.getDiagnosticMsg().isEmpty() ? 
getAppKilledDiagnostics() : event.getDiagnosticMsg();
{code}
Instead of introducing the appDiagnosticsBeforeKilling filed in RMAppImpl, I 
suggest doing below changes in RMAppImpl and RMAppAttemptImpl

{code}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
index ea9aa70..dc46326 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
@@ -1112,7 +1112,7 @@ private void 
rememberTargetTransitionsAndStoreState(RMAppEvent event,
   diags = getAppAttemptFailedDiagnostics(failedEvent);
   break;
 case ATTEMPT_KILLED:
-  diags = getAppKilledDiagnostics();
+  diags = event.getDiagnostics();
   break;
 default:
   break;
@@ -1209,21 +1209,17 @@ public AppKilledTransition() {
 
 @Override
 public void transition(RMAppImpl app, RMAppEvent event) {
-  app.diagnostics.append(getAppKilledDiagnostics());
+  app.diagnostics.append(event.getDiagnostics());
   super.transition(app, event);
 };
   }
 
-  private static String getAppKilledDiagnostics() {
-return "Application killed by user.";
-  }
-
   private static class KillAttemptTransition extends RMAppTransition {
 @Override
 public void transition(RMAppImpl app, RMAppEvent event) {
   app.stateBeforeKilling = app.getState();
   app.handler.handle(new RMAppAttemptEvent(app.currentAttempt
-.getAppAttemptId(), RMAppAttemptEventType.KILL));
+.getAppAttemptId(), RMAppAttemptEventType.KILL, 
event.getDiagnostics()));
 }
   }
 
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
index 629b2a3..d4f254e 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
@@ -1270,8 +1270,7 @@ public void transition(RMAppAttemptImpl appAttempt,
   appAttempt.invalidateAMHostAndPort();
   appEvent =
   new RMAppFailedAttemptEvent(applicationId,
-  RMAppEventType.ATTEMPT_KILLED,
-  "Application killed by user.", false);
+  RMAppEventType.ATTEMPT_KILLED, event.getDiagnostics(), 
false);
 }
 break;
 case FAILED:

{code}
- random sleep may be flicky, use {{MockRM#waitForState(ApplicationId appId, 
RMAppState finalState)}} instead
{code}
// Wait for app and attempt to be killed.
Thread.sleep(1000);
{code}

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802698#comment-14802698
 ] 

Jian He edited comment on YARN-4000 at 9/17/15 10:04 AM:
-

-  is this if condition a typo ? 
{code}
if (event.getDiagnosticMsg().isEmpty())

app.appDiagnosticsBeforeKilling =
event.getDiagnosticMsg().isEmpty() ? 
getAppKilledDiagnostics() : event.getDiagnosticMsg();
{code}
Instead of introducing the appDiagnosticsBeforeKilling filed in RMAppImpl, I 
suggest doing below changes in RMAppImpl and RMAppAttemptImpl, the idea is to 
send the diagnostics from app to attempt and let attempt send it back.

{code}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
index ea9aa70..dc46326 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
@@ -1112,7 +1112,7 @@ private void 
rememberTargetTransitionsAndStoreState(RMAppEvent event,
   diags = getAppAttemptFailedDiagnostics(failedEvent);
   break;
 case ATTEMPT_KILLED:
-  diags = getAppKilledDiagnostics();
+  diags = event.getDiagnostics();
   break;
 default:
   break;
@@ -1209,21 +1209,17 @@ public AppKilledTransition() {
 
 @Override
 public void transition(RMAppImpl app, RMAppEvent event) {
-  app.diagnostics.append(getAppKilledDiagnostics());
+  app.diagnostics.append(event.getDiagnostics());
   super.transition(app, event);
 };
   }
 
-  private static String getAppKilledDiagnostics() {
-return "Application killed by user.";
-  }
-
   private static class KillAttemptTransition extends RMAppTransition {
 @Override
 public void transition(RMAppImpl app, RMAppEvent event) {
   app.stateBeforeKilling = app.getState();
   app.handler.handle(new RMAppAttemptEvent(app.currentAttempt
-.getAppAttemptId(), RMAppAttemptEventType.KILL));
+.getAppAttemptId(), RMAppAttemptEventType.KILL, 
event.getDiagnostics()));
 }
   }
 
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
index 629b2a3..d4f254e 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
@@ -1270,8 +1270,7 @@ public void transition(RMAppAttemptImpl appAttempt,
   appAttempt.invalidateAMHostAndPort();
   appEvent =
   new RMAppFailedAttemptEvent(applicationId,
-  RMAppEventType.ATTEMPT_KILLED,
-  "Application killed by user.", false);
+  RMAppEventType.ATTEMPT_KILLED, event.getDiagnostics(), 
false);
 }
 break;
 case FAILED:

{code}
- random sleep may be flicky, use {{MockRM#waitForState(ApplicationId appId, 
RMAppState finalState)}} instead
{code}
// Wait for app and attempt to be killed.
Thread.sleep(1000);
{code}


was (Author: jianhe):
-  is this if condition a typo ? 
{code}
if (event.getDiagnosticMsg().isEmpty())

app.appDiagnosticsBeforeKilling =
event.getDiagnosticMsg().isEmpty() ? 
getAppKilledDiagnostics() : event.getDiagnosticMsg();
{code}
Instead of introducing the appDiagnosticsBeforeKilling filed in RMAppImpl, I 
suggest doing below changes in RMAppImpl and RMAppAttemptImpl

{code}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
index ea9aa70..dc46326 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
+++ 

[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802700#comment-14802700
 ] 

Hudson commented on YARN-4034:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2347 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2347/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: YARN-4000.04.patch

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4171) Resolve findbugs/javac warnings in YARN-1197 branch

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802785#comment-14802785
 ] 

Hadoop QA commented on YARN-4171:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 34s | Pre-patch YARN-1197 has 1 
extant Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 47s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings, and fixes 1 pre-existing warnings. |
| {color:red}-1{color} | yarn tests |  55m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 39s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756309/YARN-4171-YARN-1197.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-1197 / 733b0f6 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9184/artifact/patchprocess/YARN-1197FindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9184/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9184/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9184/console |


This message was automatically generated.

> Resolve findbugs/javac warnings in YARN-1197 branch
> ---
>
> Key: YARN-4171
> URL: https://issues.apache.org/jira/browse/YARN-4171
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4171-YARN-1197.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing

2015-09-17 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802731#comment-14802731
 ] 

Bibin A Chundatt commented on YARN-4155:


Hi [~ste...@apache.org]
Could you please review patch attached. 



> TestLogAggregationService.testLogAggregationServiceWithInterval failing
> ---
>
> Key: YARN-4155
> URL: https://issues.apache.org/jira/browse/YARN-4155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch
>
>
> Test failing on Jenkins: 
> {{TestLogAggregationService.testLogAggregationServiceWithInterval}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802739#comment-14802739
 ] 

Hudson commented on YARN-4034:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2322 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2322/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802738#comment-14802738
 ] 

Varun Saxena commented on YARN-4000:


bq. is this if condition a typo ?
Yes. Had updated wrong patch. Realised this after QA report. Had updated patch 
again.

bq. the idea is to send the diagnostics from app to attempt and let attempt 
send it back.
Ok, let do it this way.

bq. random sleep may be flicky, use MockRM#waitForState(ApplicationId appId, 
RMAppState finalState) instead
Ok. Will use.. 

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802754#comment-14802754
 ] 

Hudson commented on YARN-4034:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #383 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/383/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* hadoop-yarn-project/CHANGES.txt


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4034) Render cluster Max Priority in scheduler metrics in RM web UI

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791805#comment-14791805
 ] 

Hudson commented on YARN-4034:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #407 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/407/])
YARN-4034. Render cluster Max Priority in scheduler metrics in RM web UI. 
Contributed by Rohith Sharma K S (jianhe: rev 
6c6e734f0baaa7b0f8d6b85963e1ce87bac28b17)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerLeafQueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java


> Render cluster Max Priority in scheduler metrics in RM web UI
> -
>
> Key: YARN-4034
> URL: https://issues.apache.org/jira/browse/YARN-4034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4034.patch, 0001-YARN-4034.patch, 
> 0002-YARN-4034.patch, 0003-YARN-4034.patch, 0004-YARN-4034.patch, 
> YARN-4034.PNG
>
>
> Currently Scheduler Metric renders the common scheduler metrics in RM web UI. 
> It would be helpful for the user to know what is the configured cluster max 
> priority from web UI. 
> So, in RM web UI front page, Scheduler Metrics can render configured max 
> cluster priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791814#comment-14791814
 ] 

Hadoop QA commented on YARN-4000:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 57s | The applied patch generated  4 
new checkstyle issues (total was 564, now 559). |
| {color:red}-1{color} | whitespace |   0m 14s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  59m  1s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756435/YARN-4000.03.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c6e734 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9183/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9183/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9183/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9183/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9183/console |


This message was automatically generated.

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802688#comment-14802688
 ] 

Steve Loughran commented on YARN-4176:
--

I no longer trust nanoTime(). I'll do a blog post on it, but the summary is: on 
multi-core/multi-socket systems you may get either inconsistent results or time 
data from a clock that is even less granular than getTimeMillis

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802848#comment-14802848
 ] 

Varun Saxena commented on YARN-3816:


I mean for TIME_SERIES we can take value associated with latest timestamp.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4177) yarn.util.Clock should not be used to time a duration or time interval

2015-09-17 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4177:
--
Attachment: YARN-4177.001.patch

Provide a MonotonicClock.

> yarn.util.Clock should not be used to time a duration or time interval
> --
>
> Key: YARN-4177
> URL: https://issues.apache.org/jira/browse/YARN-4177
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xianyin Xin
> Attachments: YARN-4177.001.patch
>
>
> There're many places uses Clock to time intervals, which is dangerous as 
> commented by [~ste...@apache.org] in HADOOP-12409. Instead, we should use 
> hadoop.util.Timer#monotonicNow() to get monotonic time. Or we could provide a 
> MonotonicClock in yarn.util considering the consistency of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-17 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803122#comment-14803122
 ] 

MENG DING commented on YARN-4138:
-

There is an issue with the current logic:

{code:title=RMContainerImpl.java}

+  if (!changeEvent.isIncrease()) {
+// if this is a decrease request, if container was increased but not
+// told to NM, we can consider previous increase is cancelled,
+// unregister from the containerAllocationExpirer
+container.containerAllocationExpirer.unregister(container
+.getContainerId());
+  }  
{code}

Right now, if RM is processing a decrease request on a container, it (intends 
to) cancel any ongoing increase action on the same container by removing the 
container from allocation expirer. This is correct if the target resource is 
less than or equal to the last confirmed resource, otherwise this will cause 
inconsistencies. For example:

1. A container is using 2G
2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and 
issues token to AM
3. AM never uses the token, but requests to decrease the container from 8G --> 
6G, and scheduler goes ahead and decrease the resource to 6G, and also removes 
the container from allocation expirer
4. RM notifies NM to decrease resource to 6G, but since NM is still using 2G, 
the decrease message is ignored by NM
5. Now the container has 6G allocation in RM, but 2G allocation in NM.

In this ticket, we will add a last confirmed resource to RMContainer, and I 
propose to only unregister the container from expirer when the target resource 
is less than or equal to the last confirmed resource. Use the above example, 
after the fix, the behavior should be:

1. A container is using 2G
2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and 
issues token to AM
3. AM requests to decrease the container from 8G --> 6G. Scheduler decreases it 
6G, but does *not* remove the container from allocation expirer
4. The increase token expires, and scheduler reverts back the container 
resource from 6G to 2G.

Let me know if this makes sense or not. If yes, I will come up with a patch 
shortly.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4066) Large number of queues choke fair scheduler

2015-09-17 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4066:
---
Assignee: Johan Gustavsson

> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Attachments: yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-09-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802943#comment-14802943
 ] 

Naganarasimha G R commented on YARN-4176:
-

[~ste...@apache.org],
 Thanks for the comments, but based on the earlier comment 
bq. Clock-wise, how about adding a new method, `monotonicTimeMillis()`, which 
is just nanoTime/1e6; easy to switch from one to the other.
i thought by {{nanoTime/1e6}} you meant {{System.nanoTime()/100}}, which is 
similar to the modification which you had asked to 
[refer|https://github.com/apache/incubator-slider/blob/develop/slider-core/src/main/java/org/apache/slider/common/tools/Duration.java]
 but if that also is inconsistent, then any other option ?

[~bibinchundatt],
Missed one more point let the interval configuration name be 
{{yarn.nodemanager.node-labels.resync-interval-ms}} , as  configurations after 
provider was used only for config and label based provider.

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4160) Dynamic NM Resources Configuration file should be simplified.

2015-09-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803112#comment-14803112
 ] 

Junping Du commented on YARN-4160:
--

The proposed new configuration is as following:
{noformat}

  
     node_1, node_2, node_3
     1 
     1024 
  

  
     node_4, node_5
     2 
     2048 
  


{noformat}
Any comments?
btw, NodeID can be Host:Port or Host only for user's convenient.

> Dynamic NM Resources Configuration file should be simplified.
> -
>
> Key: YARN-4160
> URL: https://issues.apache.org/jira/browse/YARN-4160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> In YARN-313, we provide CLI to refresh NMs' resources dynamically. The format 
> of dynamic-resources.xml is something like following:
> {noformat}
> 
>   
> yarn.resource.dynamic.node_id_1.vcores
> 16
>   
>   
> yarn.resource.dynamic.node_id_1.memory
> 1024
>   
> 
> {noformat}
> This looks too redundant from review comments of YARN-313. We should have a 
> better, concisely format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-17 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803151#comment-14803151
 ] 

Karthik Kambatla commented on YARN-4066:


Thanks again for working on this, Johan. Took a closer look at the patch and 
have the following comments:
# A few lines are longer than 80 characters.
# For the method parameters, {{recomputeSteadyShares}} might be more 
descriptive thaan {{recalculate}}
# While at it, I would suggest the following improvements in synchronization as 
well:
## In getQueue, some of the code could be outside the synchronized block
{code}
name = ensureRootPrefix(name);
FSQueue queue;
synchronized (queues) {
  queue = queues.get(name);
  if (queue == null && create) {
// if the queue doesn't exist,create it and return
queue = createQueue(name, queueType);
  } else {
recalculate = false;
  }
}

if (recalculate) {
  rootQueue.recomputeSteadyShares();
}
return queue;
{code}
## In updateAllocationConfiguration, club the two synchronized blocks into one, 
and recomputeSteadyShares outside the synchronized block.

Since we are changing some of the locking that would be hard to unit-tests, 
would appreciate if you could run the updated patch through the tests you 
previously reported. 

> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
> Attachments: yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-17 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803119#comment-14803119
 ] 

Karthik Kambatla commented on YARN-4066:


Looking into it. 

> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
> Attachments: yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803137#comment-14803137
 ] 

Junping Du commented on YARN-3901:
--

Patch LGTM too. Looks like Jenkins doesn't be triggered against v10 patch for 
some reason and I just kick it off manually. 
Let's wait for Jenkins result.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803140#comment-14803140
 ] 

Hadoop QA commented on YARN-3816:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 54s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 37s | The applied patch generated  1 
new checkstyle issues (total was 252, now 251). |
| {color:green}+1{color} | whitespace |   0m 34s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 40s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 36s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  58m  4s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757084/YARN-3816-YARN-2928-v3.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9189/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9189/console |


This message was automatically generated.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803190#comment-14803190
 ] 

Hadoop QA commented on YARN-4141:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 16s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m 18s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m 48s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757094/0005-YARN-4141.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 58d1a02 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9190/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9190/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9190/console |


This message was automatically generated.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch, 
> 0003-YARN-4141.patch, 0004-YARN-4141.patch, 0005-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803109#comment-14803109
 ] 

Sangjin Lee commented on YARN-3901:
---

The latest patch (v.10) LGTM. Thanks much [~vrushalic] for the update!

Please let me know if you have additional feedback. I'd like to commit this 
soon.

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2015-09-17 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803100#comment-14803100
 ] 

Karthik Kambatla commented on YARN-3446:


Thanks for rebasing the patch, [~zxu]. Comments:

FSAppAttempt:
# How about using a helper method {{subtractResourcesOnBlacklistedNodes}} 
instead of adding all the logic to {{getHeadroom}} itself?
# Is the optimization to get the blacklist only when it has changed necessary? 
Looks like we optimize the fetch, but not the iteration on it. I think we 
should either go all the way and optimize iterating on the blacklist nodes as 
well only when the blacklist has changed, or leave out the optimization until 
we see a need for it. 
# To get the blacklist, can't we just use {{AppSchedulingInfo#getBlacklist}} 
(needs synchronization) or {{AppSchedulingInfo#getBlacklistCopy}}? Do we need 
the methods in the scheduler? 

If we make these changes, we might not need all the changes in rest of the 
files.


> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4177) yarn.util.Clock should not be used to time a duration or time interval

2015-09-17 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802911#comment-14802911
 ] 

Xianyin Xin commented on YARN-4177:
---

In a following steps I would try to fix several important places where clock is 
misused.

> yarn.util.Clock should not be used to time a duration or time interval
> --
>
> Key: YARN-4177
> URL: https://issues.apache.org/jira/browse/YARN-4177
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xianyin Xin
> Attachments: YARN-4177.001.patch
>
>
> There're many places uses Clock to time intervals, which is dangerous as 
> commented by [~ste...@apache.org] in HADOOP-12409. Instead, we should use 
> hadoop.util.Timer#monotonicNow() to get monotonic time. Or we could provide a 
> MonotonicClock in yarn.util considering the consistency of code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802995#comment-14802995
 ] 

Hadoop QA commented on YARN-3816:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 30s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   8m  7s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 16s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 38s | The applied patch generated  1 
new checkstyle issues (total was 252, now 251). |
| {color:red}-1{color} | checkstyle |   2m 23s | The applied patch generated  
29 new checkstyle issues (total was 0, now 29). |
| {color:red}-1{color} | checkstyle |   2m 35s | The applied patch generated  
41 new checkstyle issues (total was 0, now 41). |
| {color:green}+1{color} | whitespace |   0m 35s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   7m 51s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 46s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  59m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757084/YARN-3816-YARN-2928-v3.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/diffcheckstylehadoop-yarn-server-timelineservice.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9188/console |


This message was automatically generated.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 

[jira] [Updated] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-17 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4141:
--
Attachment: 0005-YARN-4141.patch

Thank you [~jlowe]. Yes, its a good optimization. Updating a patch based on 
same.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch, 
> 0003-YARN-4141.patch, 0004-YARN-4141.patch, 0005-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-09-17 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802894#comment-14802894
 ] 

Varun Vasudev commented on YARN-4009:
-

bq. What happens when both of these are enabled at the same time with different 
settings?

If the filter is set in core-site.xml, it will be enabled irrespective of the 
value set in yarn.timeline-service.http-cross-origin.enabled. There is no 
change in behavior here - if a user sets the existing timeline CORS filter in 
core-site.xml, it will override the value set in 
yarn.timeline-service.http-cross-origin.enabled.

bq.  Is there a need to allow the design to enable this only for webservices 
(REST APIs) instead of the whole webserver (builtin UIs and REST apis)?

I don't think there's any such need. As far as I can tell, we don't treat http 
requests differently i.e. webservices and builtin UI requests are treated the 
same.

bq. Not sure if there is a question of selecting enabling cors support for 
different services such as NN webservices vs RM webservices.

I thought about this - I'm not sure there's any benefit to adding one more set 
of config knobs. The chances are if you're fine with enabling CORS for the RM, 
you're probably fine enabling it for the NMs and the timeline server as well. 
If a user wishes to disable or enable CORS for a particular service, they can 
always set the filter initializers to the appropriate value on the node the 
service is running on(either via core-site.xml or yarn-site.xml).



> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803039#comment-14803039
 ] 

Junping Du commented on YARN-1897:
--

Thanks [~mingma] for replying the comments.
bq. Yes, the approach taken in YARN-4131 is simpler by leveraging the existing 
protocol (to accomplish the kill container scenario. But changing the NM-RM 
protocol will allow us to support other useful scenarios besides kill container 
and thread dump.
Agree. I don't mean the previous approach (YARN-4131) can replace the approach 
here. Just want to understand if the approach here can cover all cases that 
YARN-4131 try to address. Sounds like we still need YARN-4131's approach even 
when patch here goes in. Please see comments below for details.

bq. Kill container via preemption. This means RM will know about it first 
before NM, different from the signal container order which kills container 
without RM's knowledge first. It seems killing container without RM knowledge 
matches container crash test case better. But killing container via preemption 
can simulate preemption. But does it matter here as long as container is killed?
Yes. It does matter. Number of preempted containers won't be count as container 
failure in AM prospective and won't affect the success in application's running 
result. In some tests, we need to emulate both cases instead of one.

bq. Container Expiration. Is that only for a container that has been 
allocated/acquired before it is in running state? It seems it is used by RM to 
time out on container allocation/acquisition. It will trigger 
RMContainerEventType.EXPIRE and won't have impact on running container.
Sorry. I mean container LOST situation. If we want to emulate the case NM get 
shutdown (kill -9) suddenly and never come back and its impact to RMContainers. 
We may not achieve this by NM-RM protocol but better to generate some timeout 
event from RM directly?

My overall thinking is there could be two kinds of source that affect 
containers' state (in RM stand point): first is state update event trigger from 
container/NM, include mainstream cases for container's lifecycle which is well 
addressed with approach here; the other is some events generated in RM itself, 
like: resource/container preemption, lose contact with NM with running 
containers, etc. I would prefer YARN-4131 to address 2nd sources event as an 
addendum to our approach here. What do you think?

BTW, Sounds like test failure in 
TestContainerManager.testForcefulShutdownSignal is related?

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803015#comment-14803015
 ] 

Hadoop QA commented on YARN-4000:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 54s | The applied patch generated  1 
new checkstyle issues (total was 616, now 599). |
| {color:red}-1{color} | whitespace |   0m 26s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  59m  6s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m  7s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757077/YARN-4000.05.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c6e734 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9186/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9186/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9186/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9186/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9186/console |


This message was automatically generated.

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3816:
-
Attachment: YARN-3816-YARN-2928-v3.1.patch

Fix checkstyle issue (3 of 4, the last one shouldn't be fixed) and add NOTE in 
method of aggregateMetrics().

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802985#comment-14802985
 ] 

Hadoop QA commented on YARN-3816:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 12s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   8m 13s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 38s | The applied patch generated  1 
new checkstyle issues (total was 252, now 251). |
| {color:green}+1{color} | whitespace |   0m 35s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   7m 35s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |   1m 55s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  59m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757084/YARN-3816-YARN-2928-v3.1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9187/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9187/console |


This message was automatically generated.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, 

[jira] [Updated] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4000:
---
Attachment: YARN-4000.05.patch

> RM crashes with NPE if leaf queue becomes parent queue during restart
> -
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-09-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802898#comment-14802898
 ] 

Junping Du commented on YARN-3816:
--

bq. This is more from the context of caller. I am not sure if Li's patch is 
calling it but in TimelineCollector#aggregateMetrics, we have code like below. 
Here, I see latestTimelineMetrics.retrieveSingleDataValue() being called, which 
will throw an exception if metric type is not SINGLE_VALUE.
You are right that this caller case (for aggregating container metrics) for now 
only handle single value data metric because we only generate single value data 
metrics for container metrics now. I will mention it clearly in javadoc.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.patch, 
> YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4170) AM need to be notified with priority in AllocateResponse

2015-09-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803040#comment-14803040
 ] 

Sunil G commented on YARN-4170:
---

Adding the comment from 
[MAPREDUCE-5870-link|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
 where this point was discussed.

> AM need to be notified with priority in AllocateResponse 
> -
>
> Key: YARN-4170
> URL: https://issues.apache.org/jira/browse/YARN-4170
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>
> As discussed in MAPREDUCE-5870, Application Master need to be notified with 
> priority in Allocate heartbeat.  This will help AM to know the priority and 
> can update JobStatus when client asks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803213#comment-14803213
 ] 

Li Lu commented on YARN-3901:
-

Patch LGTM. Pending Jenkins. 

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value between all the cells of this 
> column will be written to the cell without any tag (empty tag) and all the 
> other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can 
> indicate running (1) or complete (2). In those cases (for metrics) only 
> complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are 
> completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow 
> numbers are retained in a separate column for historical tracking: we don’t 
> want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4170) AM need to be notified with priority in AllocateResponse

2015-09-17 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4170:
--
Attachment: 0001-YARN-4170.patch

Uploading an initial version of patch. Kindly help to check the same.

> AM need to be notified with priority in AllocateResponse 
> -
>
> Key: YARN-4170
> URL: https://issues.apache.org/jira/browse/YARN-4170
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4170.patch
>
>
> As discussed in MAPREDUCE-5870, Application Master need to be notified with 
> priority in Allocate heartbeat.  This will help AM to know the priority and 
> can update JobStatus when client asks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-17 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0005-YARN-4140.patch

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2015-09-17 Thread Parvez (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803230#comment-14803230
 ] 

Parvez commented on YARN-914:
-

Hi,

I am facing issues when trying to resize the AWS EMR cluster which is 
configured with Hadoop 2.6.0

Resizing works fine, but when decommissioning a node which has containers 
running in it, the entire emr cluster stops functioning. On a resize request, 
the EMR terminates a Task Node (EC2 instance ) randomly, without checking if it 
has containers running in it or not. 

Here YARN should perform moving the containers and the job from one node to 
another, which it isnt doing I suppose .

Could it be related to the issue listed here ? 

Please answer. Thank you. 

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4174) Fix javadoc warnings floating up from hbase

2015-09-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-4174.
---
   Resolution: Done
Fix Version/s: YARN-2928

This ended up getting fixed as part of YARN-3901.

> Fix javadoc warnings floating up from hbase 
> 
>
> Key: YARN-4174
> URL: https://issues.apache.org/jira/browse/YARN-4174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Sangjin Lee
>Priority: Minor
> Fix For: YARN-2928
>
>
> As part of the patch for YARN-3901, [~sjlee0]  observed some (~200) javadoc 
> warnings that are coming from hbase classes. 
> We tried a bunch of things like making the FlowRunCoprocessor class non 
> public and excluding the package from the pom. If the class in made non 
> public, the table creation has an exception.
> {code}
> 206 warnings
> [WARNING] Javadoc Warnings
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestWALObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category': class 
> file for org.junit.experimental.categories.Category not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test': class 
> file for org.junit.Test not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverScannerOpenHook.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> 

[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803206#comment-14803206
 ] 

Sunil G commented on YARN-4141:
---

Test case failures are *not* related. TestRMRestart failed due to 
"java.util.zip.ZipException: invalid code lengths set".

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch, 
> 0003-YARN-4141.patch, 0004-YARN-4141.patch, 0005-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3127) Avoid timeline events during RM recovery or restart

2015-09-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803381#comment-14803381
 ] 

Naganarasimha G R commented on YARN-3127:
-

Hi [~xgong]/ [~ozawa],
I feel the issue is valid and needs to be fixed, if one of you guys can take a 
look at the approach and the patch i mentioned earlier it would be helpful to 
get this jira  moving.

> Avoid timeline events during RM recovery or restart
> ---
>
> Key: YARN-3127
> URL: https://issues.apache.org/jira/browse/YARN-3127
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.6.0, 2.7.1
> Environment: RM HA with ATS
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: AppTransition.png, YARN-3127.20150213-1.patch, 
> YARN-3127.20150329-1.patch, YARN-3127.20150624-1.patch
>
>
> 1.Start RM with HA and ATS configured and run some yarn applications
> 2.Once applications are finished sucessfully start timeline server
> 3.Now failover HA form active to standby
> 4.Access timeline server URL :/applicationhistory
> //Note Earlier exception was thrown when accessed. 
> Incomplete information is shown in the ATS web UI. i.e. attempt container and 
> other information is not displayed.
> Also even if timeline server is started with RM, and on RM restart/ recovery 
> ATS events for the applications already existing in ATS are resent which is 
> not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4178:
-

 Summary: [storage implementation] app id as string can cause 
incorrect ordering
 Key: YARN-4178
 URL: https://issues.apache.org/jira/browse/YARN-4178
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Currently the app id is used in various places as part of row keys and in 
column names. However, they are treated as strings for the most part. This will 
cause a problem with ordering when the id portion of the app id rolls over to 
the next digit.

For example, "app_1234567890_100" will be considered *earlier* than 
"app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-17 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803201#comment-14803201
 ] 

Sunil G commented on YARN-4138:
---

HI [~mding]
In the same case, if we 2 incremental requests, like 2G --> 4G --> 8G. Is it 
possible to have a success case where the container size reached 4G at any 
intermediate time?. Because from RMs point of view, last incremental request is 
to reach 8GB. In NM, it may have reached 4G already. And here if we get a 
decrease request to 6G, by this logic RM container size will it fall down to 
2G. (But in real, 4G was attained in NM). Kindly let me know if I missed 
something in this analysis.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803384#comment-14803384
 ] 

Sangjin Lee commented on YARN-4074:
---

{quote}
In TimelineEntityReader#readMetrics it seems safe to assume that if we have 
more than one value that this is a TimelineMetric.Type.TIME_SERIES.
Conversely it doesn't have to be true though right? I guess we'll just assume 
that for timelines we'd never have just one value? I can't quite oversee the 
impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE if only one 
value has been written to HBase yet.
{quote}

That's right. We discussed this some time ago, and we think it'd be safer if 
the metric type (single value vs. time series) were stored/persisted. But there 
are other dimensions of metrics we may need to store (e.g. long vs. float, 
whether to aggregate, etc.). Also, there is a question of what if users wrote 
inconsistent data. So, at that time we went with a simple decision that's 
currently there (the code you see in {{TimelineEntityReader}} is refactored out 
of {{HBaseTimelineReaderImpl}} so it's not new code).

We should come to a conclusion on how to store/encode various dimensions of 
metrics, but not as part of this JIRA.

{quote}
Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should 
consider making the app_id a compound object that is stored with a ? separator. 
The prefix (in most cases in yarn right now would be "application_") would be 
separate and the RM start time and the final numeric part would be stored as a 
numerical value with a separate Bytes.to... conversion.

Otherwise we'll end up getting incorrect order for rowkeys when the application 
id wraps to 10K and each power of ten after that. For example, lexically 
application_1442351767756_1 < application_1442351767756_

If we just access the application by specific key this doesn't matter, but if 
we do a row-scan and count on ordering to set an appropriate stop on the scan, 
we'll break things.
This happens on all rowkeys with the app_id in it.
{quote}

That's a good point. We need to fix this, or we'll have incorrect 
orders/results happening with queries. This impacts anywhere we rely on the app 
id order (as string). I'll file a separate JIRA to address this issue.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, 
> YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, 
> YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-09-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4179:
-

 Summary: [reader implementation] support flow activity queries 
based on time
 Key: YARN-4179
 URL: https://issues.apache.org/jira/browse/YARN-4179
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


This came up as part of YARN-4074 and YARN-4075.

Currently the only query pattern that's supported on the flow activity table is 
by cluster only. But it might be useful to support queries by cluster and 
certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables

2015-09-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803240#comment-14803240
 ] 

Sangjin Lee commented on YARN-3901:
---

Jenkins does kick in but for some reason, it cannot post the result to the 
JIRA. The result is the following:

-1 overall

| Vote |   Subsystem |  Runtime   | Comment

|  -1  |  pre-patch  |  16m 3s| Findbugs (version ) appears to be 
|  | || broken on YARN-2928.
|  +1  |@author  |  0m 0s | The patch does not contain any 
|  | || @author tags.
|  +1  | tests included  |  0m 0s | The patch appears to include 4 new 
|  | || or modified test files.
|  +1  |  javac  |  8m 26s| There were no new javac warning 
|  | || messages.
|  +1  |javadoc  |  10m 47s   | There were no new javadoc warning 
|  | || messages.
|  +1  |  release audit  |  0m 24s| The applied patch does not increase 
|  | || the total number of release audit
|  | || warnings.
|  +1  | checkstyle  |  0m 16s| There were no new checkstyle 
|  | || issues.
|  -1  | whitespace  |  0m 50s| The patch has 9 line(s) that end in 
|  | || whitespace. Use git apply
|  | || --whitespace=fix.
|  +1  |install  |  1m 39s| mvn install still works. 
|  +1  |eclipse:eclipse  |  0m 43s| The patch built with 
|  | || eclipse:eclipse.
|  +1  |   findbugs  |  0m 52s| The patch does not introduce any 
|  | || new Findbugs (version 3.0.0)
|  | || warnings.
|  +1  | yarn tests  |  2m 37s| Tests passed in 
|  | || hadoop-yarn-server-timelineservice.
|  | |  42m 43s   | 


|| Subsystem || Report/Notes ||

| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12756422/YARN-3901-YARN-2928.10.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b1960e0 |
| whitespace | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-timelineservice test log | 
/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9191/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |

I'll remove the whitespace as I commit it. +1?

> Populate flow run data in the flow_run & flow activity tables
> -
>
> Key: YARN-3901
> URL: https://issues.apache.org/jira/browse/YARN-3901
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: YARN-3901-YARN-2928.1.patch, 
> YARN-3901-YARN-2928.10.patch, YARN-3901-YARN-2928.2.patch, 
> YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, 
> YARN-3901-YARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, 
> YARN-3901-YARN-2928.7.patch, YARN-3901-YARN-2928.8.patch, 
> YARN-3901-YARN-2928.9.patch
>
>
> As per the schema proposed in YARN-3815 in 
> https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency 
> than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even 
> if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and 
> decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for 
> the applicationId. A coprocessor will return the min value of all written 
> values. - 
> - Upon flush and compactions, the min value 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-17 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803398#comment-14803398
 ] 

Bibin A Chundatt commented on YARN-4140:


Thanks [~leftnoteasy] for review and comments.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 minutes)
>  



--
This 

[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.007.patch

v.7 patch posted.

This is now based on the YARN-2928 branch now that YARN-3901 has been resolved. 
Other than that, there are no real changes from the previous v.6 patch.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, 
> YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, 
> YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2597) MiniYARNCluster doesn't propagate reason for AHS not starting

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804278#comment-14804278
 ] 

Hadoop QA commented on YARN-2597:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   6m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 20s | Tests passed in 
hadoop-yarn-server-tests. |
| | |  20m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12671016/YARN-2597-001.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 3f82f58 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9195/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9195/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9195/console |


This message was automatically generated.

> MiniYARNCluster doesn't propagate reason for AHS not starting
> -
>
> Key: YARN-2597
> URL: https://issues.apache.org/jira/browse/YARN-2597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2597-001.patch
>
>
> If the AHS doesn't come up, your test run gets an exception telling you this 
> fact -but the underlying cause is not propagated.
> As YARN services do record their failure cause, extracting and propagating 
> this is trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804328#comment-14804328
 ] 

Hadoop QA commented on YARN-4074:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  29m 14s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |  11m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  14m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 13s | The applied patch generated  1 
new checkstyle issues (total was 31, now 32). |
| {color:green}+1{color} | whitespace |   0m 35s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 50s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 49s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   5m 39s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 13s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   2m  8s | Tests passed in 
hadoop-yarn-server-tests. |
| {color:green}+1{color} | yarn tests |   2m 32s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  75m 28s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-api |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757127/YARN-4074-YARN-2928.007.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 4b37985 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9194/console |


This message was automatically generated.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, 
> YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, 
> YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4178:
--

Assignee: Varun Saxena

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-17 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804344#comment-14804344
 ] 

Anubhav Dhoot commented on YARN-4143:
-

I think we can minimize the impact of checking on every allocate to only 
allocates before the AM is assigned which should be only a few times until AM 
itself gets launched. This avoids adding an API to the scheduler.

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-17 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4143:

Attachment: YARN-4143.001.patch

Attached patch ensures checks are done only when AM is not allocated yet. Once 
its allocated it will simply return. Also remove passing the applicationId 
which is redundant since we are checking only for this App.

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4143.001.patch
>
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804423#comment-14804423
 ] 

Vrushali C commented on YARN-4178:
--


In hRaven, we started with storing hadoop job ids as a tuple of JT/RM start 
time and the sequence number, exactly for this reason: to maintain the right 
ordering. 

But this is good as long as the prefix for app ids is "application_". If 
something changes and we have a different prefix, then querying older data 
(older format row keys) becomes harder. 

Column name ordering may not be an issue, I think.

For row keys, where do we see this incorrect ordering in row keys? In the 
applications table? But I think there is a prefix or "user!cluster!flow! flow 
runid! " to each row key before the application id, no? 



> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-17 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Attachment: YARN-1897-7.patch

Thanks [~djp]!

bq.  Number of preempted containers won't be count as container failure in AM 
prospective and won't affect the success in application's running result.
Got it. Make sense to simulate it separately.

bq. If we want to emulate the case NM get shutdown (kill -9) suddenly and never 
come back and its impact to RMContainers.
Interesting scenario. Yes, this simulation needs to be handled differently.

bq. I would prefer YARN-4131 to address 2nd sources event as an addendum to our 
approach here. What do you think?
Sounds good. I have several questions about the implementations in YARN-4131 
and can comment there.

Here is the updated patch that addresses the test results for 
TestContainerManager. TestNetworkedJob failure isn't related.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-09-17 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4044:
--
Attachment: 0004-YARN-4044.patch

Thank you Naga for the comments.
Yes, for now we are setting all information's such as queue/priority always. 
Hence that if check can be removed. Added tests also. Uploading a new patch.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch, 
> 0003-YARN-4044.patch, 0004-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1480#comment-1480
 ] 

Li Lu commented on YARN-4178:
-

We can rely on ApplicationId class in YARN api to fix this, right? As 
encapsulated in {{TimelineCollectorContext}}, shall we change the appID part 
into an ApplicationId typed object, and have an internal method to convert an 
ApplicationId to bytes for HBase storage? I suspect this is a whole flow change 
if we want to use ApplicationId in TimelineCollectorContext. Let's try not to 
break ongoing patches for this change. 

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804452#comment-14804452
 ] 

Hadoop QA commented on YARN-4140:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 27s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 53s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  55m 41s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757131/0005-YARN-4140.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3f82f58 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9196/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9196/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9196/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9196/console |


This message was automatically generated.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  

[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804414#comment-14804414
 ] 

Varun Saxena commented on YARN-4178:


ApplicationId is basically a combination of cluster timestamp and a 
monotonically increasing sequence number/id. 
We can hence store application id as a sequence of 2 longs or 2 ints in the row 
key to ensure order is maintained.

We can encode it on the way in and decode it as a string on the way out by 
using ApplicationId#toString.

We are however storing app attempts ids and container ids in the same way. They 
will go into the entity table.



> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804487#comment-14804487
 ] 

Joep Rottinghuis commented on YARN-4178:


[~vrushalic] we certainly have to store the application_ part.
[~gtCarrera9] for sure this can be done separate. We should do this in one fell 
swoop in a consistent manner across the board.

If we do store three parts separately, we should probably store the epoch 
timestamp first, then the app counter part (integer/long) and then the 
application_.
As far as I know it would be possible to imagine that the RM would hand out 
app_id's differently for Spark, or Tez, or MR or whatever the app framework 
asks for. I'd imagine that we then have something like 
application__0001, spark__0002, 
application__0003, tex__0004 etc. where the 
number still increase for each subsequent app.

> [storage implementation] app id as string can cause incorrect ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> Currently the app id is used in various places as part of row keys and in 
> column names. However, they are treated as strings for the most part. This 
> will cause a problem with ordering when the id portion of the app id rolls 
> over to the next digit.
> For example, "app_1234567890_100" will be considered *earlier* than 
> "app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.008.patch

v.8 patch posted.

Fixed the checkstyle and findbugs issues.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, 
> YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, 
> YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-17 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4180:
---

 Summary: AMLauncher does not retry on failures when talking to NM 
 Key: YARN-4180
 URL: https://issues.apache.org/jira/browse/YARN-4180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


We see issues with RM trying to launch a container while a NM is restarting and 
we get exceptions like NMNotReadyException. While YARN-3842 added retry for 
other clients of NM (AMs mainly) its not used by AMLauncher in RM causing there 
intermittent errors to cause job failures. This can manifest during rolling 
restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-09-17 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4179:
--

Assignee: Varun Saxena

> [reader implementation] support flow activity queries based on time
> ---
>
> Key: YARN-4179
> URL: https://issues.apache.org/jira/browse/YARN-4179
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>Priority: Minor
>
> This came up as part of YARN-4074 and YARN-4075.
> Currently the only query pattern that's supported on the flow activity table 
> is by cluster only. But it might be useful to support queries by cluster and 
> certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804597#comment-14804597
 ] 

Hadoop QA commented on YARN-4074:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 56s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  5s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 32s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 46s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   2m 29s | Tests passed in 
hadoop-yarn-server-tests. |
| {color:green}+1{color} | yarn tests |   2m 46s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  53m 56s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757157/YARN-4074-YARN-2928.008.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 4b37985 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9199/console |


This message was automatically generated.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, 
> YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, 
> YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-17 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804619#comment-14804619
 ] 

Anubhav Dhoot commented on YARN-4180:
-

Propose using retries in the ContainerManagement proxy used by the 
AMLauncher#getContainerMgrProxy

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804568#comment-14804568
 ] 

Hadoop QA commented on YARN-4044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 35s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 54s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  53m 49s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 102m 27s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestAppPage |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
 |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757135/0004-YARN-4044.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3f82f58 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9197/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9197/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9197/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9197/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9197/console |


This message was automatically generated.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch, 
> 0003-YARN-4044.patch, 0004-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-17 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803287#comment-14803287
 ] 

MENG DING commented on YARN-4138:
-

Hi, [~sunilg]

The case you mentioned is covered. When RM rolls back container size to 2G, it 
does the equivalent of decreasing the container size to 2G, and will 
subsequently send a decrease message to NM through heartbeat response. NM 
(currently using 4G per your example), upon receiving the response, will 
decrease its enforcement to 2G. So eventually the resource view is consistent 
between RM and NM.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-09-17 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803340#comment-14803340
 ] 

Sangjin Lee commented on YARN-4074:
---

That's correct. In other words, those are used to do {{getEntity()}}. That's 
why I said "a reader for *single-entity reads*" (plural "reads"), as opposed to 
"a reader for a single entity read".

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.007.patch, 
> YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, 
> YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, 
> YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804776#comment-14804776
 ] 

Hadoop QA commented on YARN-1897:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:red}-1{color} | javac |   7m 55s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 26s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 14s | The applied patch generated  2 
new checkstyle issues (total was 32, now 34). |
| {color:green}+1{color} | whitespace |   2m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   8m 36s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests | 105m 19s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   7m  3s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m 11s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   7m 58s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  54m 52s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 236m 58s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.mapred.TestClusterMRNotification |
|   | hadoop.mapred.TestMRTimelineEventHandling |
|   | hadoop.mapred.TestYARNRunner |
|   | hadoop.mapred.TestLazyOutput |
|   | hadoop.mapred.TestClientServiceDelegate |
|   | hadoop.mapred.TestMRIntermediateDataEncryption |
|   | hadoop.mapreduce.v2.TestUberAM |
|   | hadoop.mapreduce.v2.TestMRJobsWithHistoryService |
|   | hadoop.mapreduce.v2.TestMROldApiJobs |
|   | hadoop.mapred.TestNetworkedJob |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
| Timed out tests | 
org.apache.hadoop.mapreduce.lib.jobcontrol.TestMapReduceJobControl |
|   | org.apache.hadoop.mapreduce.v2.TestMRJobs |
|   | org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner |
|   | org.apache.hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757152/YARN-1897-7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3f82f58 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9198/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 

[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-17 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804746#comment-14804746
 ] 

Subru Krishnan commented on YARN-3985:
--

Thanks [~adhoot] for the patch. LGTM, just a minor nit - 
*TestSchedulerPlanFollowerBase::testPlanFollowerRecovery()* is not used so can 
be removed.

Kindly open a JIRA for removing _updateReservation_ from the State store as 
it's redundant.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4135:

Hadoop Flags: Reviewed

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4135:

Priority: Trivial  (was: Minor)

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804953#comment-14804953
 ] 

Hudson commented on YARN-4135:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #405 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/405/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804961#comment-14804961
 ] 

Hudson commented on YARN-4135:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #412 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/412/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType

2015-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804898#comment-14804898
 ] 

Hadoop QA commented on YARN-4143:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 15s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 59s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  55m 48s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 13s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12757137/YARN-4143.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6b97fa6 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9200/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9200/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9200/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9200/console |


This message was automatically generated.

> Optimize the check for AMContainer allocation needed by blacklisting and 
> ContainerType
> --
>
> Key: YARN-4143
> URL: https://issues.apache.org/jira/browse/YARN-4143
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-4143.001.patch
>
>
> In YARN-2005 there are checks made to determine if the allocation is for an 
> AM container. This happens in every allocate call and should be optimized 
> away since it changes only once per SchedulerApplicationAttempt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804904#comment-14804904
 ] 

Rohith Sharma K S commented on YARN-4135:
-

+1 lgtm

> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>  Labels: test
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4135) Improve the assertion message in MockRM while failing after waiting for the state.

2015-09-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804910#comment-14804910
 ] 

Hudson commented on YARN-4135:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8475 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8475/])
YARN-4135. Improve the assertion message in MockRM while failing after waiting 
for the state.(Nijel S F via rohithsharmaks) (rohithsharmaks: rev 
723c31d45bc0f64b1d1a67350b108059d2a220a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt


> Improve the assertion message in MockRM while failing after waiting for the 
> state.
> --
>
> Key: YARN-4135
> URL: https://issues.apache.org/jira/browse/YARN-4135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
>  Labels: test
> Fix For: 2.8.0
>
> Attachments: YARN-4135_1.patch, YARN-4135_2.patch
>
>
> In MockRM when the test is failed after waiting for the given state, the 
> application id or the attempt id can be printed for easy debug
> As of now if it hard to track the test fail in log since there is no relation 
> with test case and the application id.
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-09-17 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804943#comment-14804943
 ] 

Naganarasimha G R commented on YARN-4044:
-

Hi [~sunilg],
Thanks for updating the patch, few nits
# As discussed earlier, we are depending on one of the feature of HistoryServer 
to return the latest event. So it would be better to cover in our test case as, 
if there is break in that feature our testcase can catch it. i.e to add 
additional events modifying the queue / priority (may be one modification for 
queue and other for Priority). 
# ApplicationUpdatedEvent has overridden {{hashCode}} but not {{equals}}. In 
the first place do we require to over ride the hashcode method ?

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch, 
> 0003-YARN-4044.patch, 0004-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4167) NPE on RMActiveServices#serviceStop when store is null

2015-09-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804952#comment-14804952
 ] 

Rohith Sharma K S commented on YARN-4167:
-

+1 lgtm.. pending jenkins

> NPE on RMActiveServices#serviceStop when store is null
> --
>
> Key: YARN-4167
> URL: https://issues.apache.org/jira/browse/YARN-4167
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4167.patch
>
>
> Configure 
> {{yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs}} 
> mismatching with {{yarn.nm.liveness-monitor.expiry-interval-ms}}
> On startup NPE is thrown on {{RMActiveServices#serviceStop}}
> {noformat}
> 2015-09-16 12:23:29,504 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED; cause: 
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.(RMContainerTokenSecretManager.java:82)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.createContainerTokenSecretManager(RMSecretManagerService.java:109)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.(RMSecretManagerService.java:57)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createRMSecretManagerService(ResourceManager.java:)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:423)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193)
> 2015-09-16 12:23:29,507 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error closing 
> store.
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:608)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193
> {noformat}
> *Impact Area*: RM failover with wrong configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-17 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4180:
---
Affects Version/s: 2.7.1
 Target Version/s: 2.7.2
 Priority: Critical  (was: Major)

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4152) NM crash with NPE when LogAggregationService#stopContainer called for absent container

2015-09-17 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804967#comment-14804967
 ] 

Rohith Sharma K S commented on YARN-4152:
-

Can you extract {{context.getContainers().get(containerId)}} to variable and 
use the same later?

> NM crash with NPE when LogAggregationService#stopContainer called for absent 
> container
> --
>
> Key: YARN-4152
> URL: https://issues.apache.org/jira/browse/YARN-4152
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-4152.patch
>
>
> NM crash during of log aggregation.
> Ran Pi job with 500 container and killed application in between
> *Logs*
> {code}
> 2015-09-12 18:44:25,597 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_e51_1442063466801_0001_01_99 is : 143
> 2015-09-12 18:44:25,670 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101
> 2015-09-12 18:44:25,670 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_e51_1442063466801_0001_01_000101 from application 
> application_1442063466801_0001
> 2015-09-12 18:44:25,670 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:456)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:68)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1442063466801_0001
> 2015-09-12 18:44:25,692 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2015-09-12 18:44:25,692 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=dsperf   
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1442063466801_0001
> CONTAINERID=container_e51_1442063466801_0001_01_000100
> {code}
> *Analysis*
> Looks like for absent container also {{stopContainer}} is called 
> {code}
>   case CONTAINER_FINISHED:
> LogHandlerContainerFinishedEvent containerFinishEvent =
> (LogHandlerContainerFinishedEvent) event;
> stopContainer(containerFinishEvent.getContainerId(),
> containerFinishEvent.getExitCode());
> break;
> {code}
> *Event EventType: KILL_CONTAINER sent to absent container 
> container_e51_1442063466801_0001_01_000101*
> Should skip when {{null==context.getContainers().get(containerId)}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >