date:20150216


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322923#comment-14322923
 ] 

Devaraj K commented on YARN-1299:
-

Thanks [~ozawa] for review.

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323017#comment-14323017
 ] 

Hudson commented on YARN-1299:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7120 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7120/])
Revert YARN-1299. Improve a log message in AppSchedulingInfo by adding 
application id. Contributed by Ashutosh Jindal and devaraj. (ozawa: rev 
3f32357c368f4efac33835d719641c961f93a0be)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
YARN-1299. Improve a log message in AppSchedulingInfo by adding application id. 
Contributed by Ashutosh Jindal and Devaraj K. (ozawa: rev 
556386a07084b70a5d2ae0c2bd4445a348306db8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


 Improve a log message in AppSchedulingInfo by adding application id
 ---

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Fix For: 2.7.0

 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323050#comment-14323050
 ] 

zhihai xu commented on YARN-1778:
-

[~ozawa],
That is a good idea. Although we can increase 
dfs.client.block.write.locateFollowingBlock.retries in configuration file and 
the FileSystemRMStateStore will take the change in startInternal from the 
configuration file in the following code, it will affect all the other modules 
to take this change. That may not be feasible.
{code}
Configuration conf = new Configuration(getConfig());
fs = fsWorkingPath.getFileSystem(conf);
{code}
To increase the flexibility, we can create a new configuration to customize 
dfs.client.block.write.locateFollowingBlock.retries for 
FileSystemRMStateStore, which is similar as FS_RM_STATE_STORE_RETRY_POLICY_SPEC 
to customize dfs.client.retry.policy.spec for
FileSystemRMStateStore at the following code from startInternal:
{code}
String retryPolicy =
conf.get(YarnConfiguration.FS_RM_STATE_STORE_RETRY_POLICY_SPEC,
  YarnConfiguration.DEFAULT_FS_RM_STATE_STORE_RETRY_POLICY_SPEC);
conf.set(dfs.client.retry.policy.spec, retryPolicy);
{code}
I will implement a new patch based on this.
thanks for the suggestion.
zhihai

 TestFSRMStateStore fails on trunk
 -

 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: zhihai xu
 Attachments: YARN-1778.000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322973#comment-14322973
 ] 

Tsuyoshi OZAWA commented on YARN-1299:
--

Findbugs is not related to the patch. We don't need tests since this is an 
improvement for log message. Committing this shortly.

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Fix For: 2.7.0

 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322970#comment-14322970
 ] 

Hadoop QA commented on YARN-1299:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699109/YARN-1299.patch
  against trunk revision 447bd7b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6644//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6644//console

This message is automatically generated.

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

Brahma Reddy Battula created YARN-3204:
--

 Summary: Fix new findbug warnings 
inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula



Please check following findbug report..

https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-16 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323051#comment-14323051
 ] 

Varun Saxena commented on YARN-3204:


Linking it to YARN-3181

 Fix new findbug warnings 
 inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 -

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula

 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id


 [ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1299:
-
Summary: Improve a log message in AppSchedulingInfo by adding application 
id  (was: Improve 'checking for deactivate...' log message by adding app id)

 Improve a log message in AppSchedulingInfo by adding application id
 ---

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Fix For: 2.7.0

 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1299) Improve a log message in AppSchedulingInfo by adding application id


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322997#comment-14322997
 ] 

Hudson commented on YARN-1299:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7119 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7119/])
YARN-1299. Improve a log message in AppSchedulingInfo by adding application id. 
Contributed by Ashutosh Jindal and devaraj. (ozawa: rev 
9aae81c93421874b726c7b6ff970895c429e502d)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java


 Improve a log message in AppSchedulingInfo by adding application id
 ---

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Fix For: 2.7.0

 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322929#comment-14322929
 ] 

Devaraj K commented on YARN-3197:
-

{code:xml}
  protected synchronized void completedContainer(RMContainer rmContainer,
  ContainerStatus containerStatus, RMContainerEventType event) {
if (rmContainer == null) {
  LOG.info(Null container completed...);
  return;
}
{code}

Here this log can be updated with containerId from ContainerStatus along with 
the some meaningful message.


 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor

 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


 [ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1299:
-
Fix Version/s: 2.7.0

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Fix For: 2.7.0

 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-16 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323076#comment-14323076
 ] 

Craig Welch commented on YARN-2495:
---

My point is that everything necessary to manage labels properly exists without 
DECENTRALIZED_CONFIGURATION_ENABLED, it is a duplication of existing 
functionality.   The user controls this by:

1. choosing to specify or not specify a way of managing the nodes at the node 
manager
2. choosing to set or not set node labels and associations using the 
centralized apis

ergo, DECENTRALIZED_CONFIGURATION_ENABLED is completely redundant, it provides 
no capabilities not already present.  Users will need to understand how the 
feature works to use it effectively anyway, there is no value add by requiring 
that they repeat themselves (both by specifying a way of determining node 
labels at the node manager level and by having to set this switch.).  My 
prediction is that, if the switch is present, it's chief function will be to 
confuse and annoy users when they setup a configuration for the node managers 
to generate node labels and then the labels don't appear in the cluster as they 
expect them to.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-65) Reduce RM app memory footprint once app has completed


 [ 
https://issues.apache.org/jira/browse/YARN-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-65:
-

Assignee: Devaraj K

 Reduce RM app memory footprint once app has completed
 -

 Key: YARN-65
 URL: https://issues.apache.org/jira/browse/YARN-65
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 0.23.3
Reporter: Jason Lowe
Assignee: Devaraj K

 The ResourceManager holds onto a configurable number of completed 
 applications (yarn.resource.max-completed-applications, defaults to 1), 
 and the memory footprint of these completed applications can be significant.  
 For example, the {{submissionContext}} in RMAppImpl contains references to 
 protocolbuffer objects and other items that probably aren't necessary to keep 
 around once the application has completed.  We could significantly reduce the 
 memory footprint of the RM by releasing objects that are no longer necessary 
 once an application completes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3076) YarnClient implementation to retrieve label to node mapping

2015-02-16 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3076:
---
Attachment: YARN-3076.003.patch

 YarnClient implementation to retrieve label to node mapping
 ---

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3194) After NM restart,completed containers are not released which are sent during NM registration

2015-02-16 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3194:
-
Attachment: 0001-yarn-3194-v1.patch

Attached the version-1 patch.
The patch does following 
# Added ReconnectedEvent to process NMContainerStatus if applications are 
running on the node

Kindly review the patch

 After NM restart,completed containers are not released which are sent during 
 NM registration
 

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
 process only ContainerState.RUNNING. If container is completed when NM was 
 down then those containers resources wont be release which result in 
 applications to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3041) [Data Model] create the ATS entity/event API


[ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323157#comment-14323157
 ] 

Naganarasimha G R commented on YARN-3041:
-

# After having HierarchicalTimelineEntity do we require isRelatedToEntities  
relatesToEntities in TimelineEntity or vice versa?
{quote}
  private SetTimelineEntity isRelatedToEntities = new HashSet();
  private SetTimelineEntity relatesToEntities = new HashSet();
{quote}
# If any Entity data cannot be updated on subsequent posts of time line 
Entities better to capture it before hand. for example 
if we are inserting configs of timeline entity only during creation of new 
TimelineEntity...
# Regarding metrics ; TimelineEntity has set of TimelineMetric and 
TimelineMetric has 
{quote}
 private String id;
 private MapString, Object info = new HashMap();
 private Object singleData;
 private MapLong, Object timeSeries = new LinkedHashMap();
{quote}
#* whats the purpose of info ? can we name it as metadata ?
#* Are all objects stored in backend by serialization and deserialization or as 
json strings ? 
#* If metric value is object then how aggregations (primary) are done ? is info 
responsible for capturing this information?
#* IIUC if time series metric then {{singleData}} will be null and 
{{timeSeries}} will have values and if its non time series then vice versa.

 [Data Model] create the ATS entity/event API
 

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, 
 YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2015-02-16 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated YARN-2031:
-
Attachment: YARN-2031-002.patch

This is an iteration which implements part of the feature; not complete but
posted for interim review.

# The AmIPFilter now redirects with the relevant verb, as tested
# The proxy is lined up for it, except that it still only registers support for
GET.
# the redirect code in ProxyUtils is now method aware.

There's some complexity in the proxy related to policy related to redirects to
YARN pages user click throughs.

h3. Click throughs:

How to handle the click through warning on non GET operations. Current
policy: reject with 401. The warn logic could also probe the accepted types of
the GET, and 401 on anything that wanted XML or JSON, so app apis would fail
fast. Thoughts?

h3. redirecting to RM pages vs app pages

RM pages are:the app-not-registered redirect to the RM page, or the app
completed redirect to the logs

For GET operations, these are redirected as today, with a 302.

For other verbs, a 404 on the original URL Is being returned. This is designed
to fail when an app isn't running, either not-started or completed.

YARN Proxy model doesn't support REST APIs in AMs
-

Key: YARN-2031
URL: https://issues.apache.org/jira/browse/YARN-2031
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Attachments: YARN-2031-002.patch, YARN-2031.patch.001

AMs can't support REST APIs because
# the AM filter redirects all requests to the proxy with a 302 response (not
307)
# the proxy doesn't forward PUT/POST/DELETE verbs
Either the AM filter needs to return 307 and the proxy to forward the verbs,
or Am filter should not filter a REST bit of the web site

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2015-02-16 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323124#comment-14323124
]

Junping Du commented on YARN-914:
-

Thanks [~jlowe] for review and comments!
bq. Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS?
Sounds good. Will update it later.

bq. We should remove its available (not total) resources from the cluster then
continue to remove available resources as containers complete on that node.
That's a very good point. Yes. we should update resource in this way.

bq. As for the UI changes, initial thought is that decommissioning nodes should
still show up in the active nodes list since they are still running containers.
A separate decommissioning tab to filter for those nodes would be nice,
although I suppose users can also just use the jquery table to sort/search for
nodes in that state from the active nodes list if it's too crowded to add yet
another node state tab (or maybe get rid of some effectively dead tabs like the
reboot state tab).
Make sense. Will add to proposal and can discuss more details on UI JIRA later.

bq. For the NM restart open question, this should no longer an issue now that
the NM is unaware of graceful decommission.
Right.

bq. For the AM dealing with being notified of decommissioning, again I think
this should just be treated like a strict preemption for the short term. IMHO
all the AM needs to know is that the RM is planning on taking away those
containers, and what the AM should do about it is similar whether the reason
for removal is preemption or decommissioning.

bq. Back to the long running services delaying decommissioning concern, does
YARN even know the difference between a long-running container and a normal
container?
I am afraid not now. YARN-1039 should be a start to do the differentiation.

bq. If it doesn't, how is it supposed to know a container is not going to
complete anytime soon? Even a normal container could run for many hours. It
seems to me the first thing we would need before worrying about this scenario
is the ability for YARN to know/predict the expected runtime of containers.
I think prediction of expected runtime of containers could be hard in YARN
case. However, can we typically say long running service containers are
expected to run very long or infinite? If so, notifying AM to preempt
containers of LRS make more sense here than waiting here for timeout. Isn't it?

bq. There's still an open question about tracking the timeout RM side instead
of NM side. Sounds like the NM side is not going to be pursued at this point,
and we're going with no built-in timeout support in YARN for the short-term.
That was unclear at the beginning of discussion but much clear now, will remove
this part.

Support graceful decommission of nodemanager

Key: YARN-914
URL: https://issues.apache.org/jira/browse/YARN-914
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du
Attachments: Gracefully Decommission of NodeManager (v1).pdf,
Gracefully Decommission of NodeManager (v2).pdf

When NMs are decommissioned for non-fault reasons (capacity change etc.),
it's desirable to minimize the impact to running applications.
Currently if a NM is decommissioned, all running containers on the NM need to
be rescheduled on other NMs. Further more, for finished map tasks, if their
map output are not fetched by the reducers of the job, these map tasks will
need to be rerun as well.
We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs


[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323137#comment-14323137
 ] 

Hadoop QA commented on YARN-2031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699147/YARN-2031-002.patch
  against trunk revision 814afa4.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6646//console

This message is automatically generated.

 YARN Proxy model doesn't support REST APIs in AMs
 -

 Key: YARN-2031
 URL: https://issues.apache.org/jira/browse/YARN-2031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2031-002.patch, YARN-2031.patch.001


 AMs can't support REST APIs because
 # the AM filter redirects all requests to the proxy with a 302 response (not 
 307)
 # the proxy doesn't forward PUT/POST/DELETE verbs
 Either the AM filter needs to return 307 and the proxy to forward the verbs, 
 or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

[
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323154#comment-14323154
]

Naganarasimha G R commented on YARN-3034:
-

Hi [~sjlee0] [~zjshen], thanks for reviewing the patch,
bq. If aggregator is able to handle the requests in the async way, I'm okay to
use rmcontext dispatcher. Otherwise, let's make sure at least we're using a
separate async dispatcher.
+1 for having separate async dispatcher as anyway we are not planning to handle
container events in RM

bq. this creates a dependency from RM to the timeline service; perhaps it is
unavoidable...
Based on the discussions we had on the last week I understand that RM and NM
should not be directly dependent on TimelineService .
But based on 3030 patch, BaseAggregatorService.java is in timeline service
project hence where to place this RMTimelineAggregator.java class (as it
extends BaseAggregatorService ) ?
If we plan to handle similar to current approach i.e send the Entity data
through a rest client to a timeline writer service(RMTimelineAggregator), where
should this service be running i.e. as part of which process or should it be a
daemon on its own?

Other queries :
# Is RMTimelineAggregator is expected to do any primary (preliminary)
aggregation of some metrics ? Just wanted to the know reason to have a specific
TimeLineAggregator for RM separately? Similarly for NM/Applications too, what
if there are no primary aggregations and just want to push the entity data to
ATS, in these cases do we require separate services handling for per app ?
# User and Queue Entities have been newly added in the 3041 Datamodel proposal:
IIUC RM needs to add User and Queue Entities when application is created if the
specified user and queue doesnt exist as entity in ATS ? Apart from this Queue
Entity has Parent Queue information, is it something like when CS/FS is
initialized we need to create Entities for new queues and hierarcies ? Is it
not sufficent to just have for Leaf Queue Entity and just have parent path as
its meta info, is hierarchy req?

Based on clarification on these points, i can rework on the patch along with
fixing for other small issues.

[Aggregator wireup] Implement RM starting its ATS writer

Key: YARN-3034
URL: https://issues.apache.org/jira/browse/YARN-3034
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3034.20150205-1.patch

Per design in YARN-2928, implement resource managers starting their own ATS
writers.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1514:
-
Attachment: YARN-1514.7.patch

* Fixing the bug of state management(cleanup works well).
* Removing ZK_TIMEOUT_MS.
* Using ContainerId.newContainerId instead of ContainerId.newInstance.
* Fixing up default values more naturally: ZK_PERF_NUM_APP_DEFAULT is 1000, 
ZK_PERF_NUM_APPATTEMPT_PER_APP is 10.

About the excessive log messages, it can be suppressed with hadoop --loglevel 
option:

{code}
$ bin/hadoop --loglevel fatal jar 
../../../hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/target/hadoop-yarn-server-resourcemanager-3.0.0-SNAPSHOT-tests.jar
 TestZKRMStateStorePerf -appSize 5 -appAttemptSize 10 -workingZnode /Test3
ZKRMStateStore takes 39 msec to loadState.
{code}

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3076) YarnClient implementation to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323211#comment-14323211
 ] 

Hadoop QA commented on YARN-3076:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699124/YARN-3076.003.patch
  against trunk revision 447bd7b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.conf.TestJobConf

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6645//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6645//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6645//console

This message is automatically generated.

 YarnClient implementation to retrieve label to node mapping
 ---

 Key: YARN-3076
 URL: https://issues.apache.org/jira/browse/YARN-3076
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
 YARN-3076.003.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2832) Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors


[ 
https://issues.apache.org/jira/browse/YARN-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323092#comment-14323092
 ] 

Devaraj K commented on YARN-2832:
-

Nice catch [~tianyin]. Thanks for your contribution.

The patch looks good to me except these comments.

- can you change the log level to INFO and log message similar to the one in 
NodeHealthScriptRunner.serviceStart().
{code}
LOG.info(Not starting node health monitor);
{code}

- and also can you remove the shouldRun() redundant check in 
NodeHealthScriptRunner.serviceStart().


 Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors
 --

 Key: YARN-2832
 URL: https://issues.apache.org/jira/browse/YARN-2832
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1, 2.5.1
 Environment: Any environment
Reporter: Tianyin Xu
 Attachments: health.check.service.1.patch


 NodeManager allows users to specify the health checker script that will be 
 invoked by the health-checker service via the configuration parameter, 
 _yarn.nodemanager.health-checker.script.path_ 
 During the _serviceInit()_ of the health-check service, NM checks whether the 
 parameter is set correctly using _shouldRun()_, as follows,
 {code:title=/* NodeHealthCheckerService.java */|borderStyle=solid}
   protected void serviceInit(Configuration conf) throws Exception {
 if (NodeHealthScriptRunner.shouldRun(conf)) {
   nodeHealthScriptRunner = new NodeHealthScriptRunner();
   addService(nodeHealthScriptRunner);
 }
 addService(dirsHandler);
 super.serviceInit(conf);
   }
 {code}
 The problem is that if the parameter is misconfigured (e.g., permission 
 problem, wrong path), NM does not have any log message to inform users which 
 could cause latent errors or mysterious problems (e.g., why my scripts does 
 not work?)
 I see the checking and printing logic is put in _serviceStart()_ function in 
 _NodeHealthScriptRunner.java_ (see the following code snippets). However, the 
 logic is very wrong. For an incorrect parameter that does not pass the 
 shouldRun check, _serviceStart()_ would never be called because the 
 _NodeHealthScriptRunner_ instance does not have the chance to be created (see 
 the code snippets above).
 {code:title=/* NodeHealthScriptRunner.java */|borderStyle=solid}
   protected void serviceStart() throws Exception {
 // if health script path is not configured don't start the thread.
 if (!shouldRun(conf)) {
   LOG.info(Not starting node health monitor);
   return;
 }
 ... 
   }  
 {code}
 Basically, I think the checking and printing logic should be put in the 
 serviceInit() in NodeHealthCheckerService instead of serviceStart() in 
 NodeHealthScriptRunner.
 See the attachment for the simple patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3194) After NM restart,completed containers are not released which are sent during NM registration


[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323236#comment-14323236
 ] 

Hadoop QA commented on YARN-3194:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12699148/0001-yarn-3194-v1.patch
  against trunk revision 814afa4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6647//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6647//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6647//console

This message is automatically generated.

 After NM restart,completed containers are not released which are sent during 
 NM registration
 

 Key: YARN-3194
 URL: https://issues.apache.org/jira/browse/YARN-3194
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
 Environment: NM restart is enabled
Reporter: Rohith
Assignee: Rohith
 Attachments: 0001-yarn-3194-v1.patch


 On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
 process only ContainerState.RUNNING. If container is completed when NM was 
 down then those containers resources wont be release which result in 
 applications to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323315#comment-14323315
 ] 

Hadoop QA commented on YARN-1514:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699153/YARN-1514.7.patch
  against trunk revision 814afa4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6648//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6648//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6648//console

This message is automatically generated.

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows


[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323700#comment-14323700
 ] 

Naganarasimha G R commented on YARN-3040:
-

Hi [~rkanter]
Some queries related to tags and this jira
# IIUC, users create externally the Flow, Flowrun Entities and just give these 
id's as tags @ the time of app submission. so during creation of the app we 
ensure Hierarchies are updated properly. If my understanding is correct then 
whats the way user can create Flow, Flow run  Cluster ?
Or is it all the data related to the Flow, Flow run  Cluster is passed as part 
of tags and if its not present we need to create the  entities for them  @ the 
time of app submission ?
# Hopefully limitation of size (100 chars ) and ascii char only support only, 
by tags should not be a concern for passing the information to Yarn but better 
to capture this if we are considering tags as interface for passing flow and 
flow run information.
# IMHO i would have liked to have explicit interface for clients to pass these 
information rather than tags. As even though tags might serve the purpose but 
doesn't seem like graceful interface for clients.

 [Data Model] Implement client-side API for handling flows
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.


 [ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2820:

Description: 
Improve FileSystemRMStateStore to customize hdfs client retries for 
locateFollowingBlock and completeFile for better error recovery.

When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw 
the following IOexception cause the RM shutdown.

{code}
2014-10-29 23:49:12,202 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Updating info for attempt: appattempt_1409135750325_109118_01 at: 
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01

2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:46,283 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Error updating info for attempt: appattempt_1409135750325_109118_01
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
2014-10-29 23:49:46,284 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Error storing/updating appAttempt: appattempt_1409135750325_109118_01
2014-10-29 23:49:46,916 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas. 
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) 
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
 
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
 
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
at java.lang.Thread.run(Thread.java:744) 
{code}

It will be better to  Improve FileSystemRMStateStore to configure 
dfs.client.block.write.locateFollowingBlock.retries to a bigger value for 
better error recovery.
The default value for dfs.client.block.write.locateFollowingBlock.retries  is 
5.
{code}
  public static final int 
DFS_CLIENT_BLOCK_WRITE_LOCATEFOLLOWINGBLOCK_RETRIES_DEFAULT = 5;
{code}

  was:
Improve FileSystemRMStateStore to customize hdfs client retries for 
locateFollowingBlock and completeFile for better error recovery.

When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw 
the following IOexception cause the RM shutdown.

{code}
2014-10-29 23:49:12,202 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Updating info for attempt: appattempt_1409135750325_109118_01 at: 
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01

2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/

[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.


 [ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2820:

Description: 
Improve FileSystemRMStateStore to customize hdfs client retries for 
locateFollowingBlock and completeFile for better error recovery.

When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw 
the following IOexception cause the RM shutdown.

{code}
2014-10-29 23:49:12,202 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Updating info for attempt: appattempt_1409135750325_109118_01 at: 
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01

2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...

2014-10-29 23:49:46,283 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Error updating info for attempt: appattempt_1409135750325_109118_01
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
2014-10-29 23:49:46,284 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Error storing/updating appAttempt: appattempt_1409135750325_109118_01
2014-10-29 23:49:46,916 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas. 
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) 
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
 
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
 
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) 
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
at java.lang.Thread.run(Thread.java:744) 
{code}

It will be better to  Improve FileSystemRMStateStore to configure 
dfs.client.block.write.locateFollowingBlock.retries to a bigger value for 
better error recovery.
The default value for dfs.client.block.write.locateFollowingBlock.retries  is 
5.
{code}
  public static final int 
DFS_CLIENT_BLOCK_WRITE_LOCATEFOLLOWINGBLOCK_RETRIES_DEFAULT = 5;
{code}

  was:
Improve FileSystemRMStateStore to do retrying for better error recovery when 
update/store failure due to IOException from HDFS.
As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
IOException from HDFS in storeApplicationStateInternal.
We will address YARN-1778 in this JIRA also.

When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We saw 
the following IOexception cause the RM shutdown.

{code}
FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas. 
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) 
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
 
at

[jira] [Commented] (YARN-2261) YARN should have a way to run post-application cleanup

2015-02-16 Thread Xuan Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323321#comment-14323321
]

Xuan Gong commented on YARN-2261:
-

Thanks for the comments. Steve.

bq. Maybe the cleanup containers could have lower limits on allocation: 1 vcore
max...I'd advocate less mempory, but if pmem limits are turned on that's
dangerous.

bq. would there be any actual/best effort offerings of the interval between AM
termination and clean up scheduling?

I thought about this.
* request the resource for clean-up container separately after the application
is finished/failed/killed. In this case, the clean-up container can has its own
resource requirement. As vinod's comment, Cleanup container may not get
resources because cluster may have gotten busy after the final AM exit.
* request the resource for the clean-up container at the same time when we
request resource for AM container. And we can reserve the resource for the
clean-up container, after the final AM exists, we use this reserved resource to
launch the clean-up container. In this case, the clean-up container can has
its own resource requirement. But this option is not ideal. Because AM does not
know whether it is the final. Even the RM does not know whether the current
attempt is the final or not. RM only knows whether the previous attempt is
final when it decides whether need to launch the next attempt. So, we need to
request the resource for clean-up container every-time when we request resource
for AM container. If current AM container is not the final, we will waste the
resource.
* reuse the AM container resource as I proposed. If we have the feature (resize
the container resource) ready, we could definitely let clean-up container has
its own resource requirement.

Those are all the options that I can think for clean-up container scheduling,
and that is why I propose that we can just reuse the AM container resource.

bq. My token concern is related to long lived apps: what tokens will they get/?

Currently, we could just give all the latest tokens which the AM has. I
understand that for LRS apps, this is not enough. But i think that AM has the
similar issue for the token renew/token update issue, we could fix those
together.

bq. How does this mix up with pre-emption?

This is a good point. The resource for clean-up container still belongs to the
application's resource. I think that we could do:
* if the container is clean-up container, we can not pre-empt it
OR
* if the clean-up container is pre-empted, we can just simply stop the clean-up
process without retry, and mark as clean-up failure.

YARN should have a way to run post-application cleanup
--

Key: YARN-2261
URL: https://issues.apache.org/jira/browse/YARN-2261
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

See MAPREDUCE-5956 for context. Specific options are at
https://issues.apache.org/jira/browse/MAPREDUCE-5956?focusedCommentId=14054562page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14054562.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)


[ 
https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323776#comment-14323776
 ] 

Hadoop QA commented on YARN-2123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699209/YARN-2123-001.patch
  against trunk revision 9729b24.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6649//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6649//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6649//console

This message is automatically generated.

 Progress bars in Web UI always at 100% (likely due to non-US locale)
 

 Key: YARN-2123
 URL: https://issues.apache.org/jira/browse/YARN-2123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.3.0
Reporter: Johannes Simon
Assignee: Akira AJISAKA
 Attachments: YARN-2123-001.patch, screenshot.png


 In our cluster setup, the YARN web UI always shows progress bars at 100% (see 
 screenshot, progress of the reduce step is roughly at 32.82%). I opened the 
 HTML source code to check (also see screenshot), and it seems the problem is 
 that it uses a comma as decimal mark, where most browsers expect a dot for 
 floating-point numbers. This could possibly be due to localized number 
 formatting being used in the wrong place, which would also explain why this 
 bug is not always visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.


[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323788#comment-14323788
 ] 

Tsuyoshi OZAWA commented on YARN-2820:
--

[~xgong] [~zxu] Oh, I overlooked the point. Good point, Xuan. My first 
suggestion is to use [DFS-level 
retry|https://issues.apache.org/jira/browse/YARN-1778?focusedCommentId=14319725page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14319725],
 but if we support generic filesystems which are not related to HDFS, it looks 
better to implement RMStateStore-level retry as [~zxu] suggested firstly.

 Improve FileSystemRMStateStore to customize hdfs client retries for 
 locateFollowingBlock and completeFile for better error recovery.
 

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch


 Improve FileSystemRMStateStore to customize hdfs client retries for 
 locateFollowingBlock and completeFile for better error recovery.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 It will be better to  Improve FileSystemRMStateStore to configure 
 dfs.client.block.write.locateFollowingBlock.retries to a bigger value for 
 better error recovery.
 The default value for

[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.

[
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323792#comment-14323792
]

Hadoop QA commented on YARN-2820:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12699212/YARN-2820.001.patch
against trunk revision 9729b24.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 5 new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6650//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6650//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6650//console

This message is automatically generated.

Improve FileSystemRMStateStore to customize hdfs client retries for
locateFollowingBlock and completeFile for better error recovery.

Key: YARN-2820
URL: https://issues.apache.org/jira/browse/YARN-2820
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Attachments: YARN-2820.000.patch, YARN-2820.001.patch

Improve FileSystemRMStateStore to customize hdfs client retries for
locateFollowingBlock and completeFile for better error recovery.
When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We
saw the following IOexception cause the RM shutdown.
{code}
2014-10-29 23:49:12,202 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Updating info for attempt: appattempt_1409135750325_109118_01 at:
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01
2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...
2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...
2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not
complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_01.new.tmp retrying...
2014-10-29 23:49:46,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Error updating info for attempt: appattempt_1409135750325_109118_01
java.io.IOException: Unable to close file because the last block does not
have enough number of replicas.
2014-10-29 23:49:46,284 ERROR
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Error storing/updating appAttempt: appattempt_1409135750325_109118_01
2014-10-29 23:49:46,916 FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type
STATE_STORE_OP_FAILED. Cause:
java.io.IOException: Unable to close file because the last block does not
have enough number of replicas.
at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)

at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323548#comment-14323548
 ] 

Tsuyoshi OZAWA commented on YARN-1514:
--

findbugs and test failure look not related to the patch. [~jianhe], could you 
take a look? 

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.7.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, YARN-1514.3.patch, 
 YARN-1514.4.patch, YARN-1514.4.patch, YARN-1514.5.patch, YARN-1514.5.patch, 
 YARN-1514.6.patch, YARN-1514.7.patch, YARN-1514.wip-2.patch, 
 YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-16 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-3025:
-
Attachment: yarn-3025-v3.txt

work in progress: need to add the PBImpl classes.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3203) Correct the log message #AuxServices.java

Brahma Reddy Battula created YARN-3203:
--

 Summary: Correct the log message #AuxServices.java
 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


 Currently log is coming like following..

WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
*{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler which 
has a name of 'httpshuffle'. Because these are not the same tools trying to 
send ServiceData and read Service Meta Data may have issues unless the refer to 
the name in the config.

Since get class will return class as prefix,, we no need keep class in log..

{code}
Class? extends AuxiliaryService sClass = conf.getClass(
  String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), null,
  AuxiliaryService.class);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322799#comment-14322799
 ] 

Hudson commented on YARN-2749:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2038 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2038/])
YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. 
(Contributed by Xuan Gong) (junping_du: rev 
ab0b958a522d502426b91b6e4ab6dd29caccc372)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


 [ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-1299:

Attachment: YARN-1299.patch

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java


 [ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3203:
---
Attachment: YARN-3203.patch

 Correct the log message #AuxServices.java
 -

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java


 [ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3203:
---
Priority: Minor  (was: Major)

 Correct the log message #AuxServices.java
 -

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3192) Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java and #/ResourceManager.java


[ 
https://issues.apache.org/jira/browse/YARN-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322711#comment-14322711
 ] 

Brahma Reddy Battula commented on YARN-3192:


{quote}
 Signalling a clean shutdown is the desired action here, not exiting with a -1. 
Note also our use of the sole exit mechanism we allow in the Hadoop codebase, 
via a call to ExitUtil.terminate(-1, t);. That's new to branch-2+ as of this 
week; until then the code was errant.

if you're going to touch join(), rather than have it throw, have it exit with a 
boolean to indicate managed shutdown vs interruption. It'll be ignored either 
way, but if it makes you confident the code is better, then I wont say known.
{quote}

+1, for this approach..

 Empty handler for exception: java.lang.InterruptedException #WebAppProxy.java 
 and #/ResourceManager.java
 

 Key: YARN-3192
 URL: https://issues.apache.org/jira/browse/YARN-3192
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3192.patch


 The InterruptedException is completely ignored. As a result, any events 
 causing this interrupt will be lost.
  File: org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 {code}
try {
 event = eventQueue.take();
   } catch (InterruptedException e) {
 LOG.error(Returning, interrupted :  + e);
 return; // TODO: Kill RM.
   }
 {code}
 File: org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java
 {code}
 public void join() {
 if(proxyServer != null) {
   try {
 proxyServer.join();
   } catch (InterruptedException e) {
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3203) Correct the log message #AuxServices.java


[ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322757#comment-14322757
 ] 

Hadoop QA commented on YARN-3203:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699095/YARN-3203.patch
  against trunk revision ab0b958.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6643//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6643//console

This message is automatically generated.

 Correct the log message #AuxServices.java
 -

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)

2015-02-16 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned YARN-2123:
---

Assignee: Akira AJISAKA

 Progress bars in Web UI always at 100% (likely due to non-US locale)
 

 Key: YARN-2123
 URL: https://issues.apache.org/jira/browse/YARN-2123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.3.0
Reporter: Johannes Simon
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: screenshot.png


 In our cluster setup, the YARN web UI always shows progress bars at 100% (see 
 screenshot, progress of the reduce step is roughly at 32.82%). I opened the 
 HTML source code to check (also see screenshot), and it seems the problem is 
 that it uses a comma as decimal mark, where most browsers expect a dot for 
 floating-point numbers. This could possibly be due to localized number 
 formatting being used in the wrong place, which would also explain why this 
 bug is not always visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)

2015-02-16 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2123:

Attachment: YARN-2123-001.patch

Attaching a patch to use {{String.format(Locale.US, format, objects)}} instead 
of {{String.format(format, objects)}}. I grepped %.1f and %.2f in yarn 
source code and fixed them.

 Progress bars in Web UI always at 100% (likely due to non-US locale)
 

 Key: YARN-2123
 URL: https://issues.apache.org/jira/browse/YARN-2123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.3.0
Reporter: Johannes Simon
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-2123-001.patch, screenshot.png


 In our cluster setup, the YARN web UI always shows progress bars at 100% (see 
 screenshot, progress of the reduce step is roughly at 32.82%). I opened the 
 HTML source code to check (also see screenshot), and it seems the problem is 
 that it uses a comma as decimal mark, where most browsers expect a dot for 
 floating-point numbers. This could possibly be due to localized number 
 formatting being used in the wrong place, which would also explain why this 
 bug is not always visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2123) Progress bars in Web UI always at 100% (likely due to non-US locale)

2015-02-16 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2123:

Priority: Major  (was: Minor)

 Progress bars in Web UI always at 100% (likely due to non-US locale)
 

 Key: YARN-2123
 URL: https://issues.apache.org/jira/browse/YARN-2123
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.3.0
Reporter: Johannes Simon
Assignee: Akira AJISAKA
 Attachments: YARN-2123-001.patch, screenshot.png


 In our cluster setup, the YARN web UI always shows progress bars at 100% (see 
 screenshot, progress of the reduce step is roughly at 32.82%). I opened the 
 HTML source code to check (also see screenshot), and it seems the problem is 
 that it uses a comma as decimal mark, where most browsers expect a dot for 
 floating-point numbers. This could possibly be due to localized number 
 formatting being used in the wrong place, which would also explain why this 
 bug is not always visible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-16 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323697#comment-14323697
 ] 

Zhijie Shen commented on YARN-3166:
---

Records is better to be in 
{{org.apache.hadoop.yarn.api.records.timelineservice.*}}?

 [Source organization] Decide detailed package structures for timeline service 
 v2 components
 ---

 Key: YARN-3166
 URL: https://issues.apache.org/jira/browse/YARN-3166
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Open this JIRA to track all discussions on detailed package structures for 
 timeline services v2. This JIRA is for discussion only.
 For our current timeline service v2 design, aggregator (previously called 
 writer) implementation is in hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
 In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
 want to put reader related implementations into hadoop-yarn-server's:
 {{org.apache.hadoop.yarn.server.timelineservice.reader}}
 Both readers and aggregators will expose features that may be used by YARN 
 and other 3rd party components, such as aggregator/reader APIs. For those 
 features, maybe we would like to expose their interfaces to 
 hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
 Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3041) [Data Model] create the ATS entity/event API


[ 
https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323706#comment-14323706
 ] 

Naganarasimha G R commented on YARN-3041:
-

Few other minor comments :
# flow version is not captured as class member of FlowEntity
# For FlowEntity 
bq. ACCEPTABLE_ENTITY_TYPES.add(ApplicationEntity.TYPE);
Is this valid ? i was under the assumption that only FlowRun and cluster will 
be having ApplicationEntity as child


 [Data Model] create the ATS entity/event API
 

 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: Data_model_proposal_v2.pdf, YARN-3041.2.patch, 
 YARN-3041.preliminary.001.patch


 Per design in YARN-2928, create the ATS entity and events API.
 Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
 flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2820) Improve FileSystemRMStateStore to do retrying for better error recovery when update/store failure.


 [ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2820:

Attachment: YARN-2820.001.patch

 Improve FileSystemRMStateStore to do retrying for better error recovery when 
 update/store failure.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch


 Improve FileSystemRMStateStore to do retrying for better error recovery when 
 update/store failure due to IOException from HDFS.
 As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
 IOException from HDFS in storeApplicationStateInternal.
 We will address YARN-1778 in this JIRA also.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 FATAL
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 The IOexception from YARN-1778 is 
 {code}
 2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
 (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
 not started
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2128)
   at org.apache.hadoop.ipc.Client.call(Client.java:1474)
   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
   at com.sun.proxy.$Proxy23.mkdirs(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:557)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at

[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323719#comment-14323719
 ] 

zhihai xu commented on YARN-1778:
-

Hi [~ozawa], I uploaded a new patch at YARN-2820. Could you review it? thanks

 TestFSRMStateStore fails on trunk
 -

 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: zhihai xu
 Attachments: YARN-1778.000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2820) Improve FileSystemRMStateStore to customize hdfs client retries for locateFollowingBlock and completeFile for better error recovery.

2015-02-16 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323732#comment-14323732
 ] 

Xuan Gong commented on YARN-2820:
-

[~zxu], [~ozawa]
Thanks for working on this. I understand the problem. But I am not sure whether 
this is a good idea to do it.
For using FileSystemRMStateStore,  we are depended on the underly FileSystem 
(either HDFS or other distributed system). I think that we should be consistent 
with the configurations set for the FS. By changing the configuration, it will 
make it un-consistent. Do you think that is a good idea ?

 Improve FileSystemRMStateStore to customize hdfs client retries for 
 locateFollowingBlock and completeFile for better error recovery.
 

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch


 Improve FileSystemRMStateStore to customize hdfs client retries for 
 locateFollowingBlock and completeFile for better error recovery.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 It will be better to  Improve FileSystemRMStateStore to configure 
 dfs.client.block.write.locateFollowingBlock.retries to a bigger value for 
 better error recovery.
 The default value for

[jira] [Created] (YARN-3205) FileSystemRMStateStore should disable FileSystem Cache to avoid get a Filesystem with an old configuration.

zhihai xu created YARN-3205:
---

 Summary: FileSystemRMStateStore should disable FileSystem Cache to 
avoid get a Filesystem with an old configuration.
 Key: YARN-3205
 URL: https://issues.apache.org/jira/browse/YARN-3205
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu


FileSystemRMStateStore should disable FileSystem Cache to avoid get a 
Filesystem with an old configuration. The old configuration may not have all 
these customized DFS_CLIENT configurations for FileSystemRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


 [ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1299:
-
Assignee: Devaraj K

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


 [ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1299:
-
Assignee: (was: Devaraj K)

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3203) Correct the log message #AuxServices.java


[ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322826#comment-14322826
 ] 

Tsuyoshi OZAWA commented on YARN-3203:
--

+1, committing this shortly.

 Correct the log message #AuxServices.java
 -

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3203) Correct the log message in AuxServices


 [ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3203:
-
Summary: Correct the log message in AuxServices  (was: Correct the log 
message #AuxServices.java)

 Correct the log message in AuxServices
 --

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3203) Correct the log message #AuxServices.java


 [ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3203:
-
Issue Type: Improvement  (was: Bug)

 Correct the log message #AuxServices.java
 -

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1299) Improve 'checking for deactivate...' log message by adding app id


[ 
https://issues.apache.org/jira/browse/YARN-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322823#comment-14322823
 ] 

Tsuyoshi OZAWA commented on YARN-1299:
--

+1, pending for Jenkins. 

 Improve 'checking for deactivate...' log message by adding app id
 -

 Key: YARN-1299
 URL: https://issues.apache.org/jira/browse/YARN-1299
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.1-beta
Reporter: Devaraj K
 Attachments: YARN-1299.patch, yarn-1299.patch


 {code:xml}
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 2013-10-07 19:28:35,365 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
 checking for deactivate...
 {code}
 In RM log, it gives message saying 'checking for deactivate...'. It would 
 give better meaning if this log message contains app id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3203) Correct a log message in AuxServices


 [ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-3203:
-
Summary: Correct a log message in AuxServices  (was: Correct the log 
message in AuxServices)

 Correct a log message in AuxServices
 

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command

2015-02-16 Thread Jagadesh Kiran N (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322831#comment-14322831
 ] 

Jagadesh Kiran N commented on YARN-3195:


Hi Devaraj K, Thanks for your review ,please find my anlaysis below

Not considered the Yarn commands which are direct executables  doesnt require 
help. Ex : ./yarn classpath xxx or ./yarn version

Please check below for inconsitency

*Help : present for these below commands*
./yarn container :-help is present  Displays help when run -help command 
./yarn rmadmin : -help is present  Displays help when run -help command 
./yarn application : -help is present  Displays help when run -help command
./yarn applicationattempt :-help is present   Displays help when run -help 
command
./yarn queue : -help is present  Displays help when run this command


*Help :Not present for these below commands*

./yarn :-help is missing ./yarn -help : Displays the help
./yarn node : -help is missing ,./yarn node -help throws exception Unrecognized 
option: -help
./yarn logs : -help is not present ,./yarn logs  -help :Displays the help
./yarn deamonlog :-help is not present ,./yarn deamonlog -help :Displays the 
help
  * For these Not present areas i want to add help 
,please check and confirm so that i can go a head *

 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch, 
 YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-16 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322839#comment-14322839
 ] 

Allen Wittenauer commented on YARN-3168:


Both. I start with a bunch of scripts I wrote + doxia-converter and then do a 
manual pass over it.  Then I upload the patch, let someone else (usually the 
very awesome [~iwasakims]) do a second manual pass over it to fix the things I 
missed.  Then I'll review it and commit it as appropriate, knowing that we can 
always go back and fix things in subsequent JIRAs since this is for trunk and 
not for branch-2.

Keep in mind that *any delay* results in the source changing and the patch 
won't be able to applied  and I'm out of town at the moment and won't be 
able to generate a new patch until next week.

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3203) Correct a log message in AuxServices


[ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322862#comment-14322862
 ] 

Hudson commented on YARN-3203:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7118 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7118/])
YARN-3203. Correct a log message in AuxServices. Contributed by Brahma Reddy 
Battula. (ozawa: rev 447bd7b5a61a5788dc2a5d29cedfc19f0e99c0f5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Correct a log message in AuxServices
 

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3203) Correct a log message in AuxServices


[ 
https://issues.apache.org/jira/browse/YARN-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322866#comment-14322866
 ] 

Brahma Reddy Battula commented on YARN-3203:


Thanks a lot [~ozawa]!!!

 Correct a log message in AuxServices
 

 Key: YARN-3203
 URL: https://issues.apache.org/jira/browse/YARN-3203
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3203.patch


  Currently log is coming like following..
 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
 The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for  
 *{color:red}class class{color}*  org.apache.hadoop.mapred.ShuffleHandler 
 which has a name of 'httpshuffle'. Because these are not the same tools 
 trying to send ServiceData and read Service Meta Data may have issues unless 
 the refer to the name in the config.
 Since get class will return class as prefix,, we no need keep class in log..
 {code}
 Class? extends AuxiliaryService sClass = conf.getClass(
   String.format(YarnConfiguration.NM_AUX_SERVICE_FMT, sName), 
 null,
   AuxiliaryService.class);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322881#comment-14322881
 ] 

Hudson commented on YARN-2749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2057 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2057/])
YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. 
(Contributed by Xuan Gong) (junping_du: rev 
ab0b958a522d502426b91b6e4ab6dd29caccc372)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322879#comment-14322879
 ] 

Hudson commented on YARN-2749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #107 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/107/])
YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. 
(Contributed by Xuan Gong) (junping_du: rev 
ab0b958a522d502426b91b6e4ab6dd29caccc372)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322631#comment-14322631
 ] 

Hudson commented on YARN-2749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #106 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/106/])
YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. 
(Contributed by Xuan Gong) (junping_du: rev 
ab0b958a522d502426b91b6e4ab6dd29caccc372)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2749) Some testcases from TestLogAggregationService fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322635#comment-14322635
 ] 

Hudson commented on YARN-2749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #840 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/840/])
YARN-2749. Fix some testcases from TestLogAggregationService fails in trunk. 
(Contributed by Xuan Gong) (junping_du: rev 
ab0b958a522d502426b91b6e4ab6dd29caccc372)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Some testcases from TestLogAggregationService fails in trunk
 

 Key: YARN-2749
 URL: https://issues.apache.org/jira/browse/YARN-2749
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.7.0

 Attachments: YARN-2749.1.patch, YARN-2749.2.patch, YARN-2749.2.patch


 Some testcases from TestLogAggregationService fails in trunk. 
 Those can be reproduced in centos
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)
 Stack Trace:
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationService(TestLogAggregationService.java:1362)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLogAggregationServiceWithRetention(TestLogAggregationService.java:1290)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command


[ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322525#comment-14322525
 ] 

Devaraj K commented on YARN-3195:
-

Thanks [~jagadesh.kiran] for your contribution.
 I am not sure which commands you are referring for uniformity. There are the 
commands along with the 'queue' command which support '-help' listed below.

{code:xml}
yarn application -help
yarn applicationattempt -help
yarn container -help
yarn rmadmin -help
yarn scmadmin -help
{code}

IMO, Removing is not the right thing to do here, instead adding the '-help' to 
the missing commands would help the users.


 [YARN]Missing uniformity  In Yarn Queue CLI command
 ---

 Key: YARN-3195
 URL: https://issues.apache.org/jira/browse/YARN-3195
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.6.0
 Environment: SUSE Linux SP3
Reporter: Jagadesh Kiran N
Assignee: Jagadesh Kiran N
Priority: Minor
 Fix For: 2.7.0

 Attachments: Helptobe removed in Queue.png, YARN-3195.patch, 
 YARN-3195.patch


 Help is generic command should not be placed here because of this uniformity 
 is missing compared to other commands.Remove -help command inside ./yarn 
 queue as uniformity with respect to other commands 
 {code}
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue -help
 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
 queue
 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 Invalid Command Usage :
 usage: queue
 * -help  Displays help for all commands.*
  -status Queue Name   List queue information about given queue.
 {code}
 * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-16 Thread Rohith (JIRA)

Rohith created YARN-3202:


 Summary: Improve master container resource release time ICO work 
preserving restart enabled
 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor


While NM is registering with RM , If NM sends completed_container for 
masterContainer then immediately resources of master container are released by 
triggering the CONTAINER_FINISHED event. This releases all the resources held 
by master container and allocated for other pending resource requests by 
applications.

But ICO rm work preserving restart is enabled, if master container state is 
completed then the attempt is not move to FINISHING as long as container expiry 
triggered by container livelyness monitor. I think in the below code, need not 
check for work preserving restart enable so that immediately master container 
resources get released and allocated to other pending resource requests of 
different applications
{code}
// Handle received container status, this should be processed after new
// RMNode inserted
if (!rmContext.isWorkPreservingRecoveryEnabled()) {
  if (!request.getNMContainerStatuses().isEmpty()) {
LOG.info(received container statuses on node manager register :
+ request.getNMContainerStatuses());
for (NMContainerStatus status : request.getNMContainerStatuses()) {
  handleNMContainerStatus(status, nodeId);
}
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group


 [ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated YARN-3187:
-
Attachment: YARN-3187.2.patch

 Documentation of Capacity Scheduler Queue mapping based on user or group
 

 Key: YARN-3187
 URL: https://issues.apache.org/jira/browse/YARN-3187
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, documentation
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Gururaj Shetty
  Labels: documentation
 Fix For: 2.6.0

 Attachments: YARN-3187.1.patch, YARN-3187.2.patch


 YARN-2411 exposes a very useful feature {{support simple user and group 
 mappings to queues}} but its not captured in the documentation. So in this 
 jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group


[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322607#comment-14322607
 ] 

Hadoop QA commented on YARN-3187:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699063/YARN-3187.2.patch
  against trunk revision ab0b958.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6641//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6641//console

This message is automatically generated.

 Documentation of Capacity Scheduler Queue mapping based on user or group
 

 Key: YARN-3187
 URL: https://issues.apache.org/jira/browse/YARN-3187
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, documentation
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Gururaj Shetty
  Labels: documentation
 Fix For: 2.6.0

 Attachments: YARN-3187.1.patch, YARN-3187.2.patch


 YARN-2411 exposes a very useful feature {{support simple user and group 
 mappings to queues}} but its not captured in the documentation. So in this 
 jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group


[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322610#comment-14322610
 ] 

Gururaj Shetty commented on YARN-3187:
--

Please review [~Naganarasimha Garla], [~aw], [~jianhe], [~djp].

 Documentation of Capacity Scheduler Queue mapping based on user or group
 

 Key: YARN-3187
 URL: https://issues.apache.org/jira/browse/YARN-3187
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, documentation
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Gururaj Shetty
  Labels: documentation
 Fix For: 2.6.0

 Attachments: YARN-3187.1.patch, YARN-3187.2.patch


 YARN-2411 exposes a very useful feature {{support simple user and group 
 mappings to queues}} but its not captured in the documentation. So in this 
 jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group


[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322583#comment-14322583
 ] 

Gururaj Shetty commented on YARN-3187:
--

I have completed the documentation of User and Queue Mapping.  [~Naganarasimha 
Garla]/[~aw]/[~jianhe]/[~djp]

 Documentation of Capacity Scheduler Queue mapping based on user or group
 

 Key: YARN-3187
 URL: https://issues.apache.org/jira/browse/YARN-3187
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, documentation
Affects Versions: 2.6.0
Reporter: Naganarasimha G R
Assignee: Gururaj Shetty
  Labels: documentation
 Fix For: 2.6.0

 Attachments: YARN-3187.1.patch, YARN-3187.2.patch


 YARN-2411 exposes a very useful feature {{support simple user and group 
 mappings to queues}} but its not captured in the documentation. So in this 
 jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown