[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546820#comment-14546820
 ] 

Hudson commented on YARN-3526:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2145 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2145/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546799#comment-14546799
 ] 

Hudson commented on YARN-3526:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #197 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/197/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546800#comment-14546800
 ] 

Hudson commented on YARN-2421:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #197 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/197/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-16 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3591:

Assignee: Lavkesh Lahngir

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546781#comment-14546781
 ] 

Hudson commented on YARN-3526:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #187 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/187/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546782#comment-14546782
 ] 

Hudson commented on YARN-2421:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #187 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/187/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546780#comment-14546780
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #187 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/187/])
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546821#comment-14546821
 ] 

Hudson commented on YARN-2421:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2145 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2145/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3651) Tracking url in ApplicationCLI wrong for running application

2015-05-16 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546753#comment-14546753
 ] 

Devaraj K commented on YARN-3651:
-

It seems intentionally SSL was disabled for MR AM, please refer the inline 
comment in MRClientService.java. You can enable it for all yarn daemons and 
also for jobhistory using the configurations.

 Tracking url in ApplicationCLI wrong for running application
 

 Key: YARN-3651
 URL: https://issues.apache.org/jira/browse/YARN-3651
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Priority: Minor

 Application URL in Application CLI wrong
 Steps to reproduce
 ==
 1. Start HA setup insecure mode
 2.Configure HTTPS_ONLY
 3.Submit application to cluster
 4.Execute command ./yarn application -list
 5.Observer tracking URL shown
 {code}
 15/05/15 13:34:38 INFO client.AHSProxy: Connecting to Application History 
 server at /IP:45034
 Total number of applications (application-types: [] and states: [SUBMITTED, 
 ACCEPTED, RUNNING]):1
 Application-Id --- Tracking-URL
 application_1431672734347_0003   *http://host-10-19-92-117:13013*
 {code}
 *Expected*
 https://IP:64323/proxy/application_1431672734347_0003 /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546770#comment-14546770
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2127 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2127/])
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546772#comment-14546772
 ] 

Hudson commented on YARN-2421:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2127 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2127/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546771#comment-14546771
 ] 

Hudson commented on YARN-3526:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2127 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2127/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs

2015-05-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546792#comment-14546792
 ] 

Varun Saxena commented on YARN-2748:


[~vinodkv], IIUC the use case Sumit had which led to filing of YARN-2734 was 
that rolling log files were being backed up in a sub folder. Sumit didnt quite 
get back on this however to confirm.

 Upload logs in the sub-folders under the local log dir when aggregating logs
 

 Key: YARN-2748
 URL: https://issues.apache.org/jira/browse/YARN-2748
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-2748.001.patch, YARN-2748.002.patch, 
 YARN-2748.03.patch, YARN-2748.04.patch


 YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, 
 if the app is creating a sub folder and putting its rolling logs there, we 
 need to upload these logs as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546723#comment-14546723
 ] 

Hudson commented on YARN-3526:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #929 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/929/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546722#comment-14546722
 ] 

Hudson commented on YARN-3505:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #929 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/929/])
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546724#comment-14546724
 ] 

Hudson commented on YARN-2421:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #929 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/929/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-16 Thread Raju Bairishetti (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547040#comment-14547040
 ] 

Raju Bairishetti commented on YARN-3644:


[~sandflee] Yes, NM should catch the exception and keeps it alive.

Right now, NM shuts down itself only in case of connection failures. NM ignores 
all other kinds of exceptions and errors while sending heartbeats.
{code}
 } catch (ConnectException e) {
//catch and throw the exception if tried MAX wait time to connect RM
dispatcher.getEventHandler().handle(
new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
throw new YarnRuntimeException(e);
 } catch (Throwable e) {

// TODO Better error handling. Thread can die with the rest of the
// NM still running.
LOG.error(Caught exception in status-updater, e);
} 
{code}

 Node manager shuts down if unable to connect with RM
 

 Key: YARN-3644
 URL: https://issues.apache.org/jira/browse/YARN-3644
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Srikanth Sundarrajan

 When NM is unable to connect to RM, NM shuts itself down.
 {code}
   } catch (ConnectException e) {
 //catch and throw the exception if tried MAX wait time to connect 
 RM
 dispatcher.getEventHandler().handle(
 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
 throw new YarnRuntimeException(e);
 {code}
 In large clusters, if RM is down for maintenance for longer period, all the 
 NMs shuts themselves down, requiring additional work to bring up the NMs.
 Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
 effects, where non connection failures are being retried infinitely by all 
 YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-05-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546587#comment-14546587
 ] 

Xuan Gong commented on YARN-221:


[~mingma] Thanks for working on this. I have some general comments and want to 
discuss with you.
We could have a common interface called ContainerLogAggregationPolicy which can 
include at least this function:
* doLogAggregationForContainer (You might need a better name.). And this 
function will be called by AppLogAggregator to check whether the log for this 
container need to be aggregated.

So, instead of creating a enum type: ContainerLogAggregationPolicy
{code}
AGGREGATE, DO_NOT_AGGREGATE, AGGREGATE_FAILED, AGGREGATE_FAILED_OR_KILLED
{code}

We could create some basic policy which implements the common interface 
ContainerLogAggregationPolicy, such as AllContainerLogAggregationPolicy, 
NonContainerLogAggregationPolicy, AMContainerOnlyLogAggregationPolicy, 
FailContainerOnlyLogAggregationPolicy, SampleRateContainerLogAggregationPolicy, 
etc.
I think that this way might be more extendible. And in the future, clients can 
implement their own ContainerLogAggregationPolicy which can be more complex.
With this, we do not need add any new configurations in service side.
{code}
+  public static final String LOG_AGGREGATION_SAMPLE_PERCENT = NM_PREFIX
+  + log-aggregation.worker-sample-percent;
+  public static final float DEFAULT_LOG_AGGREGATION_SAMPLE_PERCENT = 1.0f;
+
+  public static final String LOG_AGGREGATION_AM_LOGS = NM_PREFIX
+  + log-aggregation.am-enable;
+  public static final boolean DEFAULT_LOG_AGGREGATION_AM_LOGS = true;
{code}
can be removed

Also, instead of adding ContainerLogAggregationPolicy into CLC, we could add 
ContainerLogAggregationPolicy into LogAggregationContext which already can be 
accessed by NM.

Thoughts ?

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Attachments: YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, 
 YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure

2015-05-16 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546590#comment-14546590
 ] 

zhihai xu commented on YARN-3591:
-

[~lavkesh], Currently DirectoryCollection supports {{fullDirs}} and 
{{errorDirs}}. Both are not good dirs. IMO {{fullDirs}} is the disk which can 
become good when the localized files are deleted by above cache-clean-up and 
{{errorDirs}} is the corrupted disk which can't become good until somebody fix 
it manually. Calling removeResource for localized resource in {{errorDirs}} 
sounds reasonable to me.

 Resource Localisation on a bad disk causes subsequent containers failure 
 -

 Key: YARN-3591
 URL: https://issues.apache.org/jira/browse/YARN-3591
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
 Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, 
 YARN-3591.2.patch


 It happens when a resource is localised on the disk, after localising that 
 disk has gone bad. NM keeps paths for localised resources in memory.  At the 
 time of resource request isResourcePresent(rsrc) will be called which calls 
 file.exists() on the localised path.
 In some cases when disk has gone bad, inodes are stilled cached and 
 file.exists() returns true. But at the time of reading, file will not open.
 Note: file.exists() actually calls stat64 natively which returns true because 
 it was able to find inode information from the OS.
 A proposal is to call file.list() on the parent path of the resource, which 
 will call open() natively. If the disk is good it should return an array of 
 paths with length at-least 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-05-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3489:
---
Attachment: YARN-3489-branch-2.7.03.patch

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489-branch-2.7.02.patch, 
 YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.patch, 
 YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) RM still allocates containers to an app in the FINISHING state

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546711#comment-14546711
 ] 

Hudson commented on YARN-2421:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/198/])
YARN-2421. RM still allocates containers to an app in the FINISHING state. 
Contributed by Chang Li (jlowe: rev f7e051c4310024d4040ad466c34432c72e88b0fc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt


 RM still allocates containers to an app in the FINISHING state
 --

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Thomas Graves
Assignee: Chang Li
 Fix For: 2.8.0

 Attachments: YARN-2421.4.patch, YARN-2421.5.patch, YARN-2421.6.patch, 
 YARN-2421.7.patch, YARN-2421.8.patch, YARN-2421.9.patch, yarn2421.patch, 
 yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3505) Node's Log Aggregation Report with SUCCEED should not cached in RMApps

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546709#comment-14546709
 ] 

Hudson commented on YARN-3505:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/198/])
YARN-3505 addendum: fix an issue in previous patch. (junping_du: rev 
03a293aed6de101b0cae1a294f506903addcaa75)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


 Node's Log Aggregation Report with SUCCEED should not cached in RMApps
 --

 Key: YARN-3505
 URL: https://issues.apache.org/jira/browse/YARN-3505
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation
Affects Versions: 2.8.0
Reporter: Junping Du
Assignee: Xuan Gong
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3505.1.patch, YARN-3505.2.patch, 
 YARN-3505.2.rebase.patch, YARN-3505.3.patch, YARN-3505.4.patch, 
 YARN-3505.5.patch, YARN-3505.6.patch, YARN-3505.addendum.patch


 Per discussions in YARN-1402, we shouldn't cache all node's log aggregation 
 reports in RMApps for always, especially for those finished with SUCCEED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3526) ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster

2015-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546710#comment-14546710
 ] 

Hudson commented on YARN-3526:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #198 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/198/])
YARN-3526. ApplicationMaster tracking URL is incorrectly redirected on a QJM 
cluster. Contributed by Weiwei Yang (xgong: rev 
b0ad644083a0dfae3a39159ac88b6fc09d846371)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
 -

 Key: YARN-3526
 URL: https://issues.apache.org/jira/browse/YARN-3526
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.6.0
 Environment: Red Hat Enterprise Linux Server 6.4 
Reporter: Weiwei Yang
Assignee: Weiwei Yang
  Labels: BB2015-05-TBR
 Fix For: 2.7.1

 Attachments: YARN-3526.001.patch, YARN-3526.002.patch


 On a QJM HA cluster, view RM web UI to track job status, it shows
 This is standby RM. Redirecting to the current active RM: 
 http://active-RM:8088/proxy/application_1427338037905_0008/mapreduce
 it refreshes every 3 sec but never going to the correct tracking page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3561) Non-AM Containers continue to run even after AM is stopped

2015-05-16 Thread Chackaravarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chackaravarthy updated YARN-3561:
-
Attachment: application_1431771946377_0001.zip

[~gsaha] / [~jianhe]

Attached logs (application_1431771946377_0001.zip) with debug level enabled. It 
contains RM and NM logs from hosts running Slider AM and non-AM application 
containers.

container_1431771946377_0001_01_01 - host3 - SliderAM
container_1431771946377_0001_01_02 - host7 - NIMBUS
container_1431771946377_0001_01_03 - host5 - STORM_UI_SERVER
container_1431771946377_0001_01_04 - host3 - DRPC_SERVER
container_1431771946377_0001_01_05 - host6 - SUPERVISOR

*Timing of issuing the commands:*

Slider start command : 2015-05-16 15:57:11,954
Slider stop command : 2015-05-16 15:59:06,480

 Non-AM Containers continue to run even after AM is stopped
 --

 Key: YARN-3561
 URL: https://issues.apache.org/jira/browse/YARN-3561
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, yarn
Affects Versions: 2.6.0
 Environment: debian 7
Reporter: Gour Saha
Priority: Critical
 Attachments: app0001.zip, application_1431771946377_0001.zip


 Non-AM containers continue to run even after application is stopped. This 
 occurred while deploying Storm 0.9.3 using Slider (0.60.0 and 0.70.1) in a 
 Hadoop 2.6 deployment. 
 Following are the NM logs from 2 different nodes:
 *host-07* - where Slider AM was running
 *host-03* - where Storm NIMBUS container was running.
 *Note:* The logs are partial, starting with the time when the relevant Slider 
 AM and NIMBUS containers were allocated, till the time when the Slider AM was 
 stopped. Also, the large number of Memory usage log lines were removed 
 keeping only a few starts and ends of every segment.
 *NM log from host-07 where Slider AM container was running:*
 {noformat}
 2015-04-29 00:39:24,614 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(356)) - Stopping resource-monitoring for 
 container_1428575950531_0020_02_01
 2015-04-29 00:41:10,310 INFO  ipc.Server (Server.java:saslProcess(1306)) - 
 Auth successful for appattempt_1428575950531_0021_01 (auth:SIMPLE)
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(803)) - Start request for 
 container_1428575950531_0021_01_01 by user yarn
 2015-04-29 00:41:10,322 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:startContainerInternal(843)) - Creating a new 
 application reference for app application_1428575950531_0021
 2015-04-29 00:41:10,323 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from NEW to INITING
 2015-04-29 00:41:10,325 INFO  nodemanager.NMAuditLogger 
 (NMAuditLogger.java:logSuccess(89)) - USER=yarn   IP=10.84.105.162
 OPERATION=Start Container Request   TARGET=ContainerManageImpl  
 RESULT=SUCCESS  APPID=application_1428575950531_0021
 CONTAINERID=container_1428575950531_0021_01_01
 2015-04-29 00:41:10,328 WARN  logaggregation.LogAggregationService 
 (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root 
 Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: 
 [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple 
 users.
 2015-04-29 00:41:10,328 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:init(182)) - rollingMonitorInterval is set as 
 -1. The log rolling mornitoring interval is disabled. The logs will be 
 aggregated after this application is finished.
 2015-04-29 00:41:10,351 INFO  application.Application 
 (ApplicationImpl.java:transition(304)) - Adding 
 container_1428575950531_0021_01_01 to application 
 application_1428575950531_0021
 2015-04-29 00:41:10,352 INFO  application.Application 
 (ApplicationImpl.java:handle(464)) - Application 
 application_1428575950531_0021 transitioned from INITING to RUNNING
 2015-04-29 00:41:10,356 INFO  container.Container 
 (ContainerImpl.java:handle(999)) - Container 
 container_1428575950531_0021_01_01 transitioned from NEW to LOCALIZING
 2015-04-29 00:41:10,357 INFO  containermanager.AuxServices 
 (AuxServices.java:handle(196)) - Got event CONTAINER_INIT for appId 
 application_1428575950531_0021
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 hdfs://zsexp/user/yarn/.slider/cluster/storm1/tmp/application_1428575950531_0021/am/lib/htrace-core-3.0.4.jar
  transitioned from INIT to DOWNLOADING
 2015-04-29 00:41:10,357 INFO  localizer.LocalizedResource 
 (LocalizedResource.java:handle(203)) - Resource 
 

[jira] [Commented] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546691#comment-14546691
 ] 

Hadoop QA commented on YARN-3489:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733319/YARN-3489-branch-2.7.03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b0ad644 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7960/console |


This message was automatically generated.

 RMServerUtils.validateResourceRequests should only obtain queue info once
 -

 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
  Labels: BB2015-05-RFC
 Attachments: YARN-3489-branch-2.7.02.patch, 
 YARN-3489-branch-2.7.03.patch, YARN-3489-branch-2.7.patch, 
 YARN-3489.01.patch, YARN-3489.02.patch, YARN-3489.03.patch


 Since the label support was added we now get the queue info for each request 
 being validated in SchedulerUtils.validateResourceRequest.  If 
 validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
 large cluster with lots of varied locality in the requests) then it will get 
 the queue info for each request.  Since we build the queue info this 
 generates a lot of unnecessary garbage, as the queue isn't changing between 
 requests.  We should grab the queue info once and pass it down rather than 
 building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547010#comment-14547010
 ] 

Naganarasimha G R commented on YARN-3565:
-

Hi [~wangda], 
 Findbugs and whitespace issue is not related to the patch

/cc [~aw],
I think currently white space is getting calculated on the diff output rather 
just the modified lines only (diff has some lines before and after the 
modifications).  git apply --whitespace=fix also passes for this patch.

 NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
 instead of String
 -

 Key: YARN-3565
 URL: https://issues.apache.org/jira/browse/YARN-3565
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
Priority: Blocker
 Attachments: YARN-3565-20150502-1.patch, YARN-3565.20150515-1.patch, 
 YARN-3565.20150516-1.patch


 Now NM HB/Register uses SetString, it will be hard to add new fields if we 
 want to support specifying NodeLabel type such as exclusivity/constraints, 
 etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM

2015-05-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546971#comment-14546971
 ] 

sandflee commented on YARN-3644:


If RM is down, NM's connection will be reset by RM machine,  could we catch 
this exception and keeps NM alive?

 Node manager shuts down if unable to connect with RM
 

 Key: YARN-3644
 URL: https://issues.apache.org/jira/browse/YARN-3644
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Srikanth Sundarrajan

 When NM is unable to connect to RM, NM shuts itself down.
 {code}
   } catch (ConnectException e) {
 //catch and throw the exception if tried MAX wait time to connect 
 RM
 dispatcher.getEventHandler().handle(
 new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
 throw new YarnRuntimeException(e);
 {code}
 In large clusters, if RM is down for maintenance for longer period, all the 
 NMs shuts themselves down, requiring additional work to bring up the NMs.
 Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side 
 effects, where non connection failures are being retried infinitely by all 
 YarnClients (via RMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)