date:20131104


 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1123:
--

Attachment: YARN-1123-6.patch

+1. I created patch which is almost the same, but fix a minor format issue

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
 YARN-1123-4.patch, YARN-1123-5.patch, YARN-1123-6.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1384) RMAppImpl#createApplicationState should call RMServerUtils#createApplicationState to convert the state


 [ 
https://issues.apache.org/jira/browse/YARN-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1384.
---

Resolution: Invalid

 RMAppImpl#createApplicationState should call 
 RMServerUtils#createApplicationState to convert the state
 --

 Key: YARN-1384
 URL: https://issues.apache.org/jira/browse/YARN-1384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: haosdent
Priority: Minor

 RMAppImpl#createApplicationState should call 
 RMServerUtils#createApplicationState to convert the state instead of 
 duplicating the conversion code. Some code refactoring is required here.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1384) RMAppImpl#createApplicationState should call RMServerUtils#createApplicationState to convert the state


[ 
https://issues.apache.org/jira/browse/YARN-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812676#comment-13812676
 ] 

Zhijie Shen commented on YARN-1384:
---

I've validated it the problem again. RMServerUtils#createApplicationState is 
already removed from trunk since YARN-540. Branch YARN-321 brings it back. We 
may have some problem when merging branch-2 into YARN-321. Close it as invalid 
now, and if we need to fix the duplicate code when merging YARN-321 back to 
branch-2, let's reopen it. Anyway, thanks for your effort, 
[~haosd...@gmail.com]!

 RMAppImpl#createApplicationState should call 
 RMServerUtils#createApplicationState to convert the state
 --

 Key: YARN-1384
 URL: https://issues.apache.org/jira/browse/YARN-1384
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: haosdent
Priority: Minor

 RMAppImpl#createApplicationState should call 
 RMServerUtils#createApplicationState to convert the state instead of 
 duplicating the conversion code. Some code refactoring is required here.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812679#comment-13812679
 ] 

Hadoop QA commented on YARN-1123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611931/YARN-1123-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2359//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2359//console

This message is automatically generated.

 [YARN-321] Adding ContainerReport and Protobuf implementation
 -

 Key: YARN-1123
 URL: https://issues.apache.org/jira/browse/YARN-1123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
 YARN-1123-4.patch, YARN-1123-5.patch, YARN-1123-6.patch


 Like YARN-978, we need some client-oriented class to expose the container 
 history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation


[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812682#comment-13812682
 ] 

Zhijie Shen commented on YARN-978:
--

+1

 [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
 --

 Key: YARN-978
 URL: https://issues.apache.org/jira/browse/YARN-978
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-978-1.patch, YARN-978.10.patch, YARN-978.2.patch, 
 YARN-978.3.patch, YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, 
 YARN-978.7.patch, YARN-978.8.patch, YARN-978.9.patch


 We dont have ApplicationAttemptReport and Protobuf implementation.
 Adding that.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1388) fair share do not display info in the scheduler page

Liyin Liang created YARN-1388:
-

 Summary: fair share do not display info in the scheduler page
 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1388) fair share do not display info in the scheduler page


 [ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated YARN-1388:
--

Description: YARN-1044 fixed min/max/used resource display problem in the 
scheduler  page. But the Fair Share has the same problem and need to fix it.

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang

 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1388) fair share do not display info in the scheduler page


 [ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated YARN-1388:
--

Attachment: yarn-1388.diff

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) fair share do not display info in the scheduler page


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812828#comment-13812828
 ] 

Hadoop QA commented on YARN-1388:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611950/yarn-1388.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2360//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2360//console

This message is automatically generated.

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) fair share do not display info in the scheduler page


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812975#comment-13812975
 ] 

Sandy Ryza commented on YARN-1388:
--

+1

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-11-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813079#comment-13813079
 ] 

Vinod Kumar Vavilapalli commented on YARN-1320:
---

I doubt if the patch is going to work if the remote file-system is HDFS. The 
propagation of the log4j properties file is via HDFS and it doesn't look like 
it is handled correctly. Please check.

 Custom log4j properties in Distributed shell does not work properly.
 

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch, 
 YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, YARN-1320.4.patch, 
 YARN-1320.4.patch, YARN-1320.5.patch, YARN-1320.6.patch, YARN-1320.6.patch, 
 YARN-1320.7.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813089#comment-13813089
 ] 

Zhijie Shen commented on YARN-979:
--

The patch is almost good with the following minor issues:

* The following javadoc is inconsistent with ApplicationAttemptReport (YARN-978)
{code}
+   * lihost - set to N/A/li
+   * liRPC port - set to -1/li
+   * liclient token - set to N/A/li
+   * lidiagnostics - set to N/A/li
+   * litracking URL - set to N/A/li
{code}

* As is mentioned in the other two jiras, please move 
GetApplicationAttemptReportRequestProtoOrBuilder p = viaProto ? proto : 
builder; later.
{code}
+  @Override
+  public ApplicationAttemptId getApplicationAttemptId() {
+GetApplicationAttemptReportRequestProtoOrBuilder p
+= viaProto ? proto : builder;
+if (this.applicationAttemptId != null) {
+  return this.applicationAttemptId;
+}
+if (!p.hasApplicationAttemptId()) {
+  return null;
+}
+this.applicationAttemptId =
+convertFromProtoFormat(p.getApplicationAttemptId());
+return this.applicationAttemptId;
+  }
{code}

* You need to change hadoop-yarn-api/pom.xml to make 
application_history_client.proto to be compiled.

In addition to the patch's issues, I'd like to raise one design issue here, 
projecting some future problems. This patch makes different APIs for 
application/attempt/container, which is going to be a super set of the APIs of 
ApplicationClientProtocol. Now it's OK if we restrict our problem with the AHS 
domain. However, probably in the future, we'd like to integrate the 
ApplicationHistoryProtocol with ApplicationClientProtocol. In other word, from 
the view of users, they may inquiry any application use a client, which makes 
it transparent whether the application report is received via 
ApplicationClientProtocol if the application is running or via 
ApplicationHistoryProtocol if it is done. Then, ApplicationClientProtocol's and 
ApplicationHistoryProtocol's APIs mismatch. Users can inquiry finished 
attempts/containers, but not the running ones. ApplicationClientProtocol may 
need to add the APIs for attempt/container as well.

In addition, another choice of the API design is to still have the only 
getApplicationReport(), but have the options to load all attempts/containers 
reports or not. Just think it out aloud. Personally, I incline to the current 
API design, which is more flexible, but I'm a bit concerned about the future 
integration. Thoughts?

 [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
 ApplicationHistoryProtocol
 --

 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
 YARN-979.2.patch


 ApplicationHistoryProtocol should have the following APIs as well:
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.4.patch

- Add a separate drainingStop flag to indicate serviceStop() is called for 
draining. 
- Move setDrainingStop() to RMStateStore.serviceInit().  

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.2.patch

 During RM restart, RM should start a new attempt only when previous attempt 
 exits for real
 --

 Key: YARN-1210
 URL: https://issues.apache.org/jira/browse/YARN-1210
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1210.1.patch, YARN-1210.2.patch


 When RM recovers, it can wait for existing AMs to contact RM back and then 
 kill them forcefully before even starting a new AM. Worst case, RM will start 
 a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
 we'll minimize multiple AMs racing with each other. This can help issues with 
 downstream components like Pig, Hive and Oozie during RM restart.
 In the mean while, new apps will proceed as usual as existing apps wait for 
 recovery.
 This can continue to be useful after work-preserving restart, so that AMs 
 which can properly sync back up with RM can continue to run and those that 
 don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

[
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813109#comment-13813109
]

Omkar Vinit Joshi commented on YARN-1210:
-

completely removed RECOVERED state. rest of the patch is same. Only major
difference is
* Before launching new appAttempt RM will check if any of the application
attempts were running before. If so then RM will wait instead of starting a new
application attempt. If no application attempts are found to be in running
(anything other than final state) state then it launch new application attempt.
* When Node manager receives resync signal it kills all the running containers
and then reports back the killed containers to RM during RM registration. On
receiving the container information RM checks if any of the reported container
is an AM container If so then sends container_failed event to the related app
attempt and eventually starts new application attempt.

During RM restart, RM should start a new attempt only when previous attempt
exits for real
--

Key: YARN-1210
URL: https://issues.apache.org/jira/browse/YARN-1210
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Attachments: YARN-1210.1.patch, YARN-1210.2.patch

When RM recovers, it can wait for existing AMs to contact RM back and then
kill them forcefully before even starting a new AM. Worst case, RM will start
a new AppAttempt after waiting for 10 mins ( the expiry interval). This way
we'll minimize multiple AMs racing with each other. This can help issues with
downstream components like Pig, Hive and Oozie during RM restart.
In the mean while, new apps will proceed as usual as existing apps wait for
recovery.
This can continue to be useful after work-preserving restart, so that AMs
which can properly sync back up with RM can continue to run and those that
don't are guaranteed to be killed before starting a new attempt.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813115#comment-13813115
 ] 

Omkar Vinit Joshi commented on YARN-1210:
-

cancelled the patch as it is based on YARN-674

 During RM restart, RM should start a new attempt only when previous attempt 
 exits for real
 --

 Key: YARN-1210
 URL: https://issues.apache.org/jira/browse/YARN-1210
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1210.1.patch, YARN-1210.2.patch


 When RM recovers, it can wait for existing AMs to contact RM back and then 
 kill them forcefully before even starting a new AM. Worst case, RM will start 
 a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
 we'll minimize multiple AMs racing with each other. This can help issues with 
 downstream components like Pig, Hive and Oozie during RM restart.
 In the mean while, new apps will proceed as usual as existing apps wait for 
 recovery.
 This can continue to be useful after work-preserving restart, so that AMs 
 which can properly sync back up with RM can continue to run and those that 
 don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) fair share do not display info in the scheduler page

2013-11-04 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813118#comment-13813118
 ] 

Sangjin Lee commented on YARN-1388:
---

Looks good to me. Thanks for the patch!

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813126#comment-13813126
 ] 

Hadoop QA commented on YARN-1121:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611996/YARN-1121.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2361//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2361//console

This message is automatically generated.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-04 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813139#comment-13813139
]

Luke Lu commented on YARN-311:
--

[~djp]: Unfortunately YARN-1343 got in before I tried to merge the patch. Now
the patch won't compile due to the old RMNodeImpl ctor usage in
TestRMNodeTransition. Can you rebase the patch?

Dynamic node resource configuration: core scheduler changes
---

Key: YARN-311
URL: https://issues.apache.org/jira/browse/YARN-311
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-311-v1.patch, YARN-311-v10.patch,
YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch,
YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch,
YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch,
YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch

As the first step, we go for resource change on RM side and expose admin APIs
(admin protocol, CLI, REST and JMX API) later. In this jira, we will only
contain changes in scheduler.
The flow to update node's resource and awareness in resource scheduling is:
1. Resource update is through admin API to RM and take effect on RMNodeImpl.
2. When next NM heartbeat for updating status comes, the RMNode's resource
change will be aware and the delta resource is added to schedulerNode's
availableResource before actual scheduling happens.
3. Scheduler do resource allocation according to new availableResource in
SchedulerNode.
For more design details, please refer proposal and discussions in parent
JIRA: YARN-291.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2013-11-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813183#comment-13813183
 ] 

Vinod Kumar Vavilapalli commented on YARN-90:
-

Thanks for the patch, Song! Some quick comments:
 - Because you are changing the semantics of checkDirs(), there are more 
changes that are needed.
  -- updateDirsAfterFailure() - updateConfAfterDirListChange?
  -- The log message in updateDirsAfterFailure: Disk(s) failed.  should be 
changed to something like Disk-health report changed:  or something like that.
 - Web UI and Web-services are fine for now I think, nothing to do there.
 - Drop the extraneous System.out.println lines in all of the patch.
 - Let's drop the metrics changes. We need to expose this end-to-end and not 
just metrics - client side reports, jmx and metrics. Worth tracking that effort 
separately.
 - Test:
-- testAutoDir() - testDisksGoingOnAndOff ?
-- Can you also validate the health-report both when disks go off and when 
they come back again?
-- Also just throw unwanted exceptions instead of catching them and 
printing stack-trace.

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

[
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813196#comment-13813196
]

Mayank Bansal commented on YARN-979:

[~zjshen] Thanks for the review.

bq. You need to change hadoop-yarn-api/pom.xml to make
application_history_client.proto to be compiled.

Its already there.

bq. In addition to the patch's issues, I'd like to raise one design issue here,
projecting some future problems. This patch makes different APIs for
application/attempt/container, which is going to be a super set of the APIs of
ApplicationClientProtocol. Now it's OK if we restrict our problem with the AHS
domain. However, probably in the future, we'd like to integrate the
ApplicationHistoryProtocol with ApplicationClientProtocol. In other word, from
the view of users, they may inquiry any application use a client, which makes
it transparent whether the application report is received via
ApplicationClientProtocol if the application is running or via
ApplicationHistoryProtocol if it is done. Then, ApplicationClientProtocol's and
ApplicationHistoryProtocol's APIs mismatch. Users can inquiry finished
attempts/containers, but not the running ones. ApplicationClientProtocol may
need to add the APIs for attempt/container as well.
In addition, another choice of the API design is to still have the only
getApplicationReport(), but have the options to load all attempts/containers
reports or not. Just think it out aloud. Personally, I incline to the current
API design, which is more flexible, but I'm a bit concerned about the future
integration. Thoughts?

I will create the jira for making applicationclientprotocol similar to
applicationHistoryProtocol

Thanks,
Mayank

[YARN-321] Add more APIs related to ApplicationAttempt and Container in
ApplicationHistoryProtocol
--

Key: YARN-979
URL: https://issues.apache.org/jira/browse/YARN-979
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch,
YARN-979.2.patch

ApplicationHistoryProtocol should have the following APIs as well:
* getApplicationAttemptReport
* getApplicationAttempts
* getContainerReport
* getContainers
The corresponding request and response classes need to be added as well.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813197#comment-13813197
 ] 

Mayank Bansal commented on YARN-979:


Rest of the comments incorporated.

Thanks,
Mayank

 [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
 ApplicationHistoryProtocol
 --

 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
 YARN-979-5.patch, YARN-979.2.patch


 ApplicationHistoryProtocol should have the following APIs as well:
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-979:
---

Attachment: YARN-979-5.patch

Updating the latest patch.

Thanks,
Mayank

 [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
 ApplicationHistoryProtocol
 --

 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
 YARN-979-5.patch, YARN-979.2.patch


 ApplicationHistoryProtocol should have the following APIs as well:
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1222) Make improvements in ZKRMStateStore for fencing

2013-11-04 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1222:
---

Attachment: yarn-1222-4.patch

Updating new patch. If HA is enabled, when any of the ZK operations result in 
KeeperException.NoAuthException, the RM is automatically transitioned to 
Standby state. Added unit test to verify fencing works.





 Make improvements in ZKRMStateStore for fencing
 ---

 Key: YARN-1222
 URL: https://issues.apache.org/jira/browse/YARN-1222
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
 yarn-1222-4.patch


 Using multi-operations for every ZK interaction. 
 In every operation, automatically creating/deleting a lock znode that is the 
 child of the root znode. This is to achieve fencing by modifying the 
 create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol


[ 
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813205#comment-13813205
 ] 

Hadoop QA commented on YARN-979:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612013/YARN-979-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2363//console

This message is automatically generated.

 [YARN-321] Add more APIs related to ApplicationAttempt and Container in 
 ApplicationHistoryProtocol
 --

 Key: YARN-979
 URL: https://issues.apache.org/jira/browse/YARN-979
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-979-1.patch, YARN-979-3.patch, YARN-979-4.patch, 
 YARN-979-5.patch, YARN-979.2.patch


 ApplicationHistoryProtocol should have the following APIs as well:
 * getApplicationAttemptReport
 * getApplicationAttempts
 * getContainerReport
 * getContainers
 The corresponding request and response classes need to be added as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813274#comment-13813274
 ] 

Hadoop QA commented on YARN-1222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612014/yarn-1222-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2364//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2364//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2364//console

This message is automatically generated.

 Make improvements in ZKRMStateStore for fencing
 ---

 Key: YARN-1222
 URL: https://issues.apache.org/jira/browse/YARN-1222
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
 yarn-1222-4.patch


 Using multi-operations for every ZK interaction. 
 In every operation, automatically creating/deleting a lock znode that is the 
 child of the root znode. This is to achieve fencing by modifying the 
 create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1378) Implement a RMStateStore cleaner for deleting application/attempt info


[ 
https://issues.apache.org/jira/browse/YARN-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813299#comment-13813299
 ] 

Jian He commented on YARN-1378:
---

Hi [~ozawa], this jira is oriented only for periodically cleaning app/attempt 
data in state store, should not block or blocked by them, but may need code 
level rebase

 Implement a RMStateStore cleaner for deleting application/attempt info
 --

 Key: YARN-1378
 URL: https://issues.apache.org/jira/browse/YARN-1378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1378.1.patch


 Now that we are storing the final state of application/attempt instead of 
 removing application/attempt info on application/attempt 
 completion(YARN-891), we need a separate RMStateStore cleaner for cleaning 
 the application/attempt state.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813301#comment-13813301
 ] 

Zhijie Shen commented on YARN-1121:
---

One typo: setDraningStop - setDrainingStop

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813305#comment-13813305
 ] 

Jian He commented on YARN-1121:
---

bq. One typo: setDraningStop - setDrainingStop
Nice catch ! will fix it in the next patch.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1389) Merging the ApplicationClientProtocol and ApplicationHistoryProtocol

Mayank Bansal created YARN-1389:
---

 Summary: Merging the ApplicationClientProtocol and 
ApplicationHistoryProtocol
 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen


It seems to be expensive to maintain a big number of outstanding t-file 
writers. RM is likely to run out of the I/O resources. Probably we'd like to 
limit the number of concurrent outstanding t-file writers, and queue the 
writing requests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1389) Merging the ApplicationClientProtocol and ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-1389:
---

Assignee: Mayank Bansal  (was: Zhijie Shen)

 Merging the ApplicationClientProtocol and ApplicationHistoryProtocol
 

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 At some point we need more infor in applicationClientProtocol which we have 
 in ApplicationHistoryProtocol.
 We need to merge those.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1389) Merging the ApplicationClientProtocol and ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1389:


Description: 
At some point we need more infor in applicationClientProtocol which we have in 
ApplicationHistoryProtocol.
We need to merge those.

  was:It seems to be expensive to maintain a big number of outstanding t-file 
writers. RM is likely to run out of the I/O resources. Probably we'd like to 
limit the number of concurrent outstanding t-file writers, and queue the 
writing requests.


 Merging the ApplicationClientProtocol and ApplicationHistoryProtocol
 

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen

 At some point we need more infor in applicationClientProtocol which we have 
 in ApplicationHistoryProtocol.
 We need to merge those.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-954) [YARN-321] History Service should create the webUI and wire it to HistoryStorage


[ 
https://issues.apache.org/jira/browse/YARN-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813320#comment-13813320
 ] 

Mayank Bansal commented on YARN-954:


[~devaraj.k] 

Hi Deveraj,

there are some changes been done on YARN-321 branch recently and we wanted to 
put this patch asap, Can you please do the changes or I can take this up ?

Thanks,
Mayank

 [YARN-321] History Service should create the webUI and wire it to 
 HistoryStorage
 

 Key: YARN-954
 URL: https://issues.apache.org/jira/browse/YARN-954
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Devaraj K
 Attachments: YARN-954-3.patch, YARN-954-v0.patch, YARN-954-v1.patch, 
 YARN-954-v2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1023) [YARN-321] Webservices REST API's support for Application History


[ 
https://issues.apache.org/jira/browse/YARN-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813322#comment-13813322
 ] 

Mayank Bansal commented on YARN-1023:
-

[~devaraj.k] 

Hi Deveraj,

there are some changes been done on YARN-321 branch recently and we wanted to 
put this patch asap, Can you please do the changes or I can take this up ?

Thanks,
Mayank

 [YARN-321] Webservices REST API's support for Application History
 -

 Key: YARN-1023
 URL: https://issues.apache.org/jira/browse/YARN-1023
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-321
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-1023-v0.patch, YARN-1023-v1.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-11-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813337#comment-13813337
 ] 

Vinod Kumar Vavilapalli commented on YARN-1374:
---

I agree with both the sides. But more to the last point that Karthik made - 
monitors are getting added to RM directly, though the intention wasn't that.

+1 for this patch as it fixes that issue. Let's file a separate ticket for the 
CompositeService issue.

Checking this in.

 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-11-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813346#comment-13813346
 ] 

Steve Loughran commented on YARN-1374:
--

[~bikassaha] -if we clone the list before iterating, the newly added siblings 
won't cause problems during the init or start operations -they won't get 
called. But: if you do then  add an uninited service during init, it won't get 
inited; add uninited or inited to start they won't get started. Maybe: allow an 
addition, but the service you add must always be in the same state of the 
composite service. That way, if you do add a new service -you have to get it 
into the correct state before the add() call.



 Resource Manager fails to start due to ConcurrentModificationException
 --

 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1374-1.patch, yarn-1374-1.patch


 Resource Manager is failing to start with the below 
 ConcurrentModificationException.
 {code:xml}
 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
 Refreshing hosts (include/exclude) list
 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
 Service ResourceManager failed in state INITED; cause: 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioning to standby
 2013-10-30 20:22:42,378 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
 Transitioned to standby
 2013-10-30 20:22:42,378 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 java.util.ConcurrentModificationException
   at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
   at java.util.AbstractList$Itr.next(AbstractList.java:343)
   at 
 java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
 2013-10-30 20:22:42,379 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
 /
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.5.patch

- Fixed the typo
- Added a new DrainEventHandler for ignoring events while draining to stop.
- Created a new field handlerInstance for recording the earlier handler 
instance.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813409#comment-13813409
 ] 

Hadoop QA commented on YARN-1121:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612047/YARN-1121.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2365//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2365//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2365//console

This message is automatically generated.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers

[
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813463#comment-13813463
]

Sandy Ryza commented on YARN-445:
-

In 0.21, when a task was going to be killed due to timeout, a SIGQUIT would be
sent to it to dump its stacks to standard out (MAPREDUCE-1119). This was a
useful feature that I'm currently working on backporting to branch-1 in
MAPREDUCE-5592. It would be good to make sure that whatever we do here can
accommodate something similar.

Ability to signal containers

Key: YARN-445
URL: https://issues.apache.org/jira/browse/YARN-445
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
Attachments: YARN-445--n2.patch, YARN-445--n3.patch,
YARN-445--n4.patch, YARN-445.patch

It would be nice if an ApplicationMaster could send signals to contaniers
such as SIGQUIT, SIGUSR1, etc.
For example, in order to replicate the jstack-on-task-timeout feature
implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an
interface for sending SIGQUIT to a container. For that specific feature we
could implement it as an additional field in the StopContainerRequest.
However that would not address other potential features like the ability for
an AM to trigger jstacks on arbitrary tasks *without* killing them. The
latter feature would be a very useful debugging tool for users who do not
have shell access to the nodes.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813464#comment-13813464
 ] 

Sandy Ryza commented on YARN-445:
-

To expand on that, it would be nice not to require 
SIGQUIT-then-SIGTERM-then-SIGKILL to need multiple RPCs.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813468#comment-13813468
 ] 

Jason Lowe commented on YARN-445:
-

However it would also be nice to not always tie SIGQUIT to SIGTERM/SIGKILL.  
I'd love to give users the ability to diagnose tasks by themselves without 
killing them in the process.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813470#comment-13813470
 ] 

Sandy Ryza commented on YARN-445:
-

Very true

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813471#comment-13813471
 ] 

Sandy Ryza commented on YARN-445:
-

Oops didn't realize that that feature was the original motivator for this JIRA.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
 YARN-445--n4.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Attachment: YARN-1266-2.patch

Cleaning up patch and moving rest of the stuff to corresponding JIRAS.

Thanks,
Mayank

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813500#comment-13813500
 ] 

Jian He commented on YARN-1210:
---

- Instead of passing running containers as parameter in 
RegisterNodeManagerRequest, is it possible to just call heartBeat immediately 
after registerCall and then unBlockNewContainerRequests ? That way we can take 
advantage of the existing heartbeat logic, cover other things like keep app 
alive for log aggregation after AM container completes.
 -- Or at least we can send the list of ContainerStatus(including diagnostics) 
instead of just container Ids and also the list of keep-alive apps (separate 
jira)?
-  Unnecessary import changes  in DefaultContainerExecutor.java and 
LinuxContainerExecutor, ContainerLaunch, ContainersLauncher
-  Finished containers may not necessary be killed. The containers can also 
normal finish and remain in the NM cache before NM resync.
 {code}
 RMAppAttemptContainerFinishedEvent evt =
new RMAppAttemptContainerFinishedEvent(appAttemptId,
ContainerStatus.newInstance(cId, ContainerState.COMPLETE,
Killed due to RM restart,
ExitCode.FORCE_KILLED.getExitCode()));
{code}
- wrong LOG class name. 
{code}
private static final Log LOG = LogFactory.getLog(RMAppImpl.class);
{code}

- Isn't always the case that after this patch only the last attempt can be 
running ? a new attempt will not be launched until the previous attempt reports 
back it really exits. If this is case, it can be a bug.
We may only need to check that if the last attempt is finished or not.
{code}
// check if any application attempt was running
// if yes then don't start new application attempt.
for (EntryApplicationAttemptId, RMAppAttempt attempt : app.attempts
.entrySet()) {
  boolean appAttemptInFinalState =
  RMAppAttemptImpl.isAttemptInFinalState(attempt.getValue());
  LOG.info(attempt : + attempt.getKey().toString()
  +  in final state : + appAttemptInFinalState);
  if (!appAttemptInFinalState) {
// One of the application attempt is not in final state.
// Not starting new application attempt.
return RMAppState.RUNNING;
  }
}
{code}
- should we return RUNNING or ACCEPTED for apps that are not in final state ? 
It's ok to return RUNNING in the scope of this patch because anyways we are 
launching a new attempt. Later on in working preserving restart, RM can crash 
before attempt register, attempt can register with RM after RM comes back in 
which case we can then move app from ACCEPTED to RUNNING? 

 During RM restart, RM should start a new attempt only when previous attempt 
 exits for real
 --

 Key: YARN-1210
 URL: https://issues.apache.org/jira/browse/YARN-1210
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1210.1.patch, YARN-1210.2.patch


 When RM recovers, it can wait for existing AMs to contact RM back and then 
 kill them forcefully before even starting a new AM. Worst case, RM will start 
 a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
 we'll minimize multiple AMs racing with each other. This can help issues with 
 downstream components like Pig, Hive and Oozie during RM restart.
 In the mean while, new apps will proceed as usual as existing apps wait for 
 recovery.
 This can continue to be useful after work-preserving restart, so that AMs 
 which can properly sync back up with RM can continue to run and those that 
 don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813505#comment-13813505
 ] 

Hadoop QA commented on YARN-1266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612074/YARN-1266-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2366//console

This message is automatically generated.

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1323) Set HTTPS webapp address along with other RPC addresses


[ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813529#comment-13813529
 ] 

Sandy Ryza commented on YARN-1323:
--

+1

 Set HTTPS webapp address along with other RPC addresses
 ---

 Key: YARN-1323
 URL: https://issues.apache.org/jira/browse/YARN-1323
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1323-1.patch


 YARN-1232 adds the ability to configure multiple RMs, but missed out the 
 https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) fair share do not display info in the scheduler page


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813540#comment-13813540
 ] 

Liyin Liang commented on YARN-1388:
---

Because it is a small UI change, the patch didn't add  new tests. 
Manual steps  to verify this patch:
1. Configure RM to use FairScheduler
2. Go to the scheduler page in RM
3. Click any queue to display the detailed info
4. Without this patch, the fair share entry does not display info 
5. With this patch, the fair share entry shows memory and vcore info 

 fair share do not display info in the scheduler page
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1323) Set HTTPS webapp address along with other RPC addresses in HAUtil


 [ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1323:
-

Summary: Set HTTPS webapp address along with other RPC addresses in HAUtil  
(was: Set HTTPS webapp address along with other RPC addresses)

 Set HTTPS webapp address along with other RPC addresses in HAUtil
 -

 Key: YARN-1323
 URL: https://issues.apache.org/jira/browse/YARN-1323
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1323-1.patch


 YARN-1232 adds the ability to configure multiple RMs, but missed out the 
 https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-04 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813547#comment-13813547
]

Junping Du commented on YARN-311:
-

Sure. Will update patch soon. Thx!

Dynamic node resource configuration: core scheduler changes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

[
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813560#comment-13813560
]

Omkar Vinit Joshi commented on YARN-674:

Thanks [~vinodkv] for review...

bq. Does this patch also include YARN-1210? Seems like it, we should separate
that code.
No .. anything specific? YARN-1210 is more about waiting for older AM to finish
before launching a new AM.

bq. Depending on the final patch, I think we should split
RMAppManager.submitApp into two, one for regular submit and one for submit
after recovery.
Splitting the method into 2.
* submitApplication - normal application submission
* submitRecoveredApplication - submitting recovered application

bq. RMAppState.java change is unnecessary.
fixed

bq. ForwardingEventHandler is a bottleneck for renewals now - especially during
submission. We need to have a thread pool.
Create fixed thread pool service with thread count controllable via
configuration (Not adding this to yarn-default). Keeping default thread count
to be 5. fair enough?

bq. Once we do the above, the old concurrency test should be added back.
yeah..added that test back..

bq. We are undoing most of YARN-1107. Good that we laid the groundwork there.
Let's make sure we remove all the dead code. One comment stands out
Anything did I miss here? didn't understand. The comment I have not removed as
it is still valid.

bq. The newly added test can have race conditions? We may be lucky in the test,
but in real life scenario, client has to submit app and poll for app failure
due to invalid tokens
I think it will not. For clients yes after they submit the application they
will have to keep polling to know the status of the application (got accepted
or failed due to token renewal).

bq. Similarly we should add a test for successful submission after renewal.
sure added one.. checking for RMAppEvent.START

Slow or failing DelegationToken renewals on submission itself make RM
unavailable
-

Key: YARN-674
URL: https://issues.apache.org/jira/browse/YARN-674
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch,
YARN-674.4.patch

This was caused by YARN-280. A slow or a down NameNode for will make it look
like RM is unavailable as it may run out of RPC handlers due to blocked
client submissions.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.5.patch

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813562#comment-13813562
 ] 

Jian He commented on YARN-1279:
---

- LogAggregationState:   DISABLE - DISABLED, NOT_START - NOT_STARTED

- Log Aggregation is NM side config, this is getting from RM itself.
{code}
  if (!conf.getBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED,
  YarnConfiguration.DEFAULT_LOG_AGGREGATION_ENABLED)) {
return LogAggregationState.DISABLE;
  }
{code}

- LogAggregationStatus may come via heartbeat before FinalTransition is called, 
inside which containerLogAggregationStatus is initialized with the containers. 
In this case, the log status is lost. 
{code}
public void updateLogAggregationStatus(ContainerLogAggregationStatus status) {
this.writeLock.lock();
try {
  if (containerLogAggregationStatus.containsKey(status.getContainerId())) {
LogAggregationState currentState =
containerLogAggregationStatus.get(status.getContainerId());
if (currentState != LogAggregationState.COMPLETED
 currentState != LogAggregationState.FAILED) {
  if (status.getLogAggregationState() == LogAggregationState.COMPLETED) 
{
LogAggregationCompleted.getAndAdd(1);
  } else if (status.getLogAggregationState() == 
LogAggregationState.FAILED) {
LogAggregationFailed.getAndAdd(1);
  }
  containerLogAggregationStatus.put(status.getContainerId(),
  status.getLogAggregationState());
}
  }
} finally {
  this.writeLock.unlock();
}
  }
{code}

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
 YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1323) Set HTTPS webapp address along with other RPC addresses in HAUtil

2013-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813564#comment-13813564
 ] 

Hudson commented on YARN-1323:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4692 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4692/])
YARN-1323. Set HTTPS webapp address along with other RPC addresses in HAUtil 
(Karthik Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538851)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java


 Set HTTPS webapp address along with other RPC addresses in HAUtil
 -

 Key: YARN-1323
 URL: https://issues.apache.org/jira/browse/YARN-1323
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Fix For: 2.3.0

 Attachments: yarn-1323-1.patch


 YARN-1232 adds the ability to configure multiple RMs, but missed out the 
 https web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1388) Fair Scheduler page always displays blank fair share


 [ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1388:
-

Summary: Fair Scheduler page always displays blank fair share  (was: fair 
share do not display info in the scheduler page)

 Fair Scheduler page always displays blank fair share
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) Fair Scheduler page always displays blank fair share


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813566#comment-13813566
 ] 

Sandy Ryza commented on YARN-1388:
--

I just committed this.  THanks [~liangly]!

 Fair Scheduler page always displays blank fair share
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
Assignee: Liyin Liang
 Fix For: 2.2.1

 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-04 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Junping Du updated YARN-311:

Attachment: YARN-311-v13.patch

Dynamic node resource configuration: core scheduler changes
---

Key: YARN-311
URL: https://issues.apache.org/jira/browse/YARN-311
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Attachments: YARN-311-v1.patch, YARN-311-v10.patch,
YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch,
YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch,
YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch,
YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch,
YARN-311-v9.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-11-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813574#comment-13813574
 ] 

Junping Du commented on YARN-311:
-

Updated in v13 patch.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
 YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
 YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
 YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
 YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813580#comment-13813580
 ] 

Bikas Saha commented on YARN-1197:
--

Wangda, sorry for the delayed response. Was caught up with other work. I will 
take a look at the new proposal. [~vinodkv] Can you please take a look at the 
latest proposal?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1388) Fair Scheduler page always displays blank fair share

2013-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813581#comment-13813581
 ] 

Hudson commented on YARN-1388:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4693 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4693/])
YARN-1388. Fair Scheduler page always displays blank fair share (Liyin Liang 
via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1538855)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


 Fair Scheduler page always displays blank fair share
 

 Key: YARN-1388
 URL: https://issues.apache.org/jira/browse/YARN-1388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Liyin Liang
Assignee: Liyin Liang
 Fix For: 2.2.1

 Attachments: yarn-1388.diff


 YARN-1044 fixed min/max/used resource display problem in the scheduler  page. 
 But the Fair Share has the same problem and need to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813638#comment-13813638
 ] 

Hadoop QA commented on YARN-674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612089/YARN-674.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2368//console

This message is automatically generated.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes


[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813661#comment-13813661
 ] 

Hadoop QA commented on YARN-311:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612092/YARN-311-v13.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2367//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2367//console

This message is automatically generated.

 Dynamic node resource configuration: core scheduler changes
 ---

 Key: YARN-311
 URL: https://issues.apache.org/jira/browse/YARN-311
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-311-v1.patch, YARN-311-v10.patch, 
 YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, 
 YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, 
 YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
 YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
 YARN-311-v9.patch


 As the first step, we go for resource change on RM side and expose admin APIs 
 (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
 contain changes in scheduler. 
 The flow to update node's resource and awareness in resource scheduling is: 
 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
 2. When next NM heartbeat for updating status comes, the RMNode's resource 
 change will be aware and the delta resource is added to schedulerNode's 
 availableResource before actual scheduling happens.
 3. Scheduler do resource allocation according to new availableResource in 
 SchedulerNode.
 For more design details, please refer proposal and discussions in parent 
 JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813672#comment-13813672
 ] 

Bikas Saha commented on YARN-674:
-

We were intentionally going through the same submitApplication() method to make 
sure that all the initialization and setup code paths are consistently followed 
in both cases by keeping the code path identical as much as possible. The RM 
would submit a recovered application, in essence proxying a user submitting the 
application. Its a general pattern followed through the recovery logic - to be 
minimally invasive to the mainline code path so that we can avoid functional 
bugs as much as possible. Separating them into 2 methods has resulted in code 
duplication in both methods without any huge benefit that I can see. It also 
leave us susceptible to future code changes made in one code path and not the 
other.

Why is isSecurityEnabled() being checked at this internal level. The code 
should not even reach this point if security is not enabled. It should already 
be taken care of in the public apis, right? 
Also why is it calling 
rmContext.getDelegationTokenRenewer().addApplication(event) instead of 
DelegationTokenRenewer.this.addApplication(). Same for 
rmContext.getDelegationTokenRenewer().applicationFinished(evt);
{code}
@SuppressWarnings(unchecked)
+private void handleDTRenewerEvent(
+DelegationTokenRenewerAppSubmitEvent event) {
+  try {
+// Setup tokens for renewal
+if (UserGroupInformation.isSecurityEnabled()) {
+  rmContext.getDelegationTokenRenewer().addApplication(event);
+  rmContext.getDispatcher().getEventHandler()
+  .handle(new RMAppEvent(event.getAppicationId(),
+  event.isApplicationRecovered() ? RMAppEventType.RECOVER
+  : RMAppEventType.START));
+}
+  } catch (Throwable t) {
{code}

These assumptions may make the code brittle to future changes. Also Typo in 
comments. We should probably assert that the application state is NEW over here 
so that the broken assumption is caught at the source instead of at the 
destination app causing a state machine crash.
{code}
+Unable to add the application to the delegation token renewer.,
+t);
+// Sending APP_REJECTED is fine, since we assume that the
+// RMApp is in NEW state and thus we havne't yet informed the
+// Scheduler about the existence of the application
+rmContext.getDispatcher().getEventHandler().handle(
+new RMAppRejectedEvent(event.getAppicationId(), t.getMessage()));
+  }
{code}

typo
{code}
 public ApplicationId getAppicationId() {
{code}

@Private + @VisibleForTesting???
{code}
+  //Only for Testing
+  public int getInProcessDelegationTokenRenewerEventsCount() {
+return this.renewerCount.get();
+  }
{code}

Can DelegationTokenRenewerAppSubmitEvent event objects have an event type 
different from VERIFY_AND_START_APPLICATION? If not, we dont need this check 
and we can change the constructor of DelegationTokenRenewerAppSubmitEvent to 
not expect an event type argument. It should set the 
VERIFY_AND_START_APPLICATION within the constructor.
{code}
+  if (evt.getType().equals(
+  DelegationTokenRenewerEventType.VERIFY_AND_START_APPLICATION)
+   evt instanceof DelegationTokenRenewerAppSubmitEvent) {
{code}

Rename DelegationTokenRenewerThread to not have misleading Thread in the name ?

Why is this warning not happening for other services? Whats special in the code 
for DelegationTokenRenewer?
{code}
+  !-- Ignore Synchronization issues as they are never going to occur for
+   methods like serviceInit(), serviceStart() and handle() --
+  Match
+Class 
name=org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer
 /
+Bug pattern=IS2_INCONSISTENT_SYNC /
+  /Match
{code}

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-04 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813678#comment-13813678
 ] 

Xuan Gong commented on YARN-1279:
-

bq.LogAggregationState: DISABLE - DISABLED, NOT_START - NOT_STARTED

Changed

bq. Log Aggregation is NM side config, this is getting from RM itself.

Yes, you are right. Removed. Will rely on the containerLogAggregationState.

bq. LogAggregationStatus may come via heartbeat before FinalTransition is 
called, inside which containerLogAggregationStatus is initialized with the 
containers. In this case, the log status is lost.

Removed the initialization in FinalTransition. Only get the number of finished 
Containers at FinalTransition state

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
 YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-11-04 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1279:


Attachment: YARN-1279.5.patch

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
 YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
 YARN-1279.5.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-261) Ability to kill AM attempts

2013-11-04 Thread Andrey Klochkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-261:
-

Attachment: YARN-261--n7.patch

Uploading a patch rebased after YARN-891 and with fixes according to Jason's 
comments.

 Ability to kill AM attempts
 ---

 Key: YARN-261
 URL: https://issues.apache.org/jira/browse/YARN-261
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Jason Lowe
Assignee: Andrey Klochkov
 Attachments: YARN-261--n2.patch, YARN-261--n3.patch, 
 YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, 
 YARN-261--n7.patch, YARN-261.patch


 It would be nice if clients could ask for an AM attempt to be killed.  This 
 is analogous to the task attempt kill support provided by MapReduce.
 This feature would be useful in a scenario where AM retries are enabled, the 
 AM supports recovery, and a particular AM attempt is stuck.  Currently if 
 this occurs the user's only recourse is to kill the entire application, 
 requiring them to resubmit a new application and potentially breaking 
 downstream dependent jobs if it's part of a bigger workflow.  Killing the 
 attempt would allow a new attempt to be started by the RM without killing the 
 entire application, and if the AM supports recovery it could potentially save 
 a lot of work.  It could also be useful in workflow scenarios where the 
 failure of the entire application kills the workflow, but the ability to kill 
 an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813685#comment-13813685
 ] 

Bikas Saha commented on YARN-1121:
--

There are 3 new booleans with 8 combinations possible between them. Which 
combinations are legal? Which are impossible? Some comments will help 
understand their interaction. Naming could be better. e.g. drainEventsOnStop 
instead of drainingStopNeeded and drainOnStop instead of drainingStop.
{code}
+  private volatile boolean drained = true;
+  private volatile boolean drainingStopNeeded = false;
+  private volatile boolean drainingStop = false;
{code}

Typo {code} +  LOG.info(Ignoring events as AsyncDispatcher is draning to 
stop.); {code}

Isnt this almost a tight loop? Given that storing stuff will be over the 
network and slow, why not have a wait notify between this thread and the 
draining thread?

DrainEventHandler sounds misleading. It doesnt really drain. It ignores or 
drops events. The other thing we can do is take a count of the number of 
pending events to drain at service stop. Then make sure we drain only those 
many, thus ignoring the new ones. This removes the need of drainingStop and 
reduces the combinatorics of booleans.

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing


[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813686#comment-13813686
 ] 

Bikas Saha commented on YARN-1121:
--

Code for comment above - Isnt this almost a tight loop? Given that storing 
stuff will be over the network and slow, why not have a wait notify between 
this thread and the draining thread?
{code}
   protected void serviceStop() throws Exception {
+if (drainingStopNeeded) {
+  drainingStop = true;
+  while(!drained) {
+Thread.yield();
+  }
{code}

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Jian He
 Fix For: 2.2.1

 Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
 YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1307) Rethink znode structure for RM HA


[ 
https://issues.apache.org/jira/browse/YARN-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813689#comment-13813689
 ] 

Bikas Saha commented on YARN-1307:
--

This probably needs major rebasing after recent changes to the state store apis 
that retain completed applications instead of deleting them.

 Rethink znode structure for RM HA
 -

 Key: YARN-1307
 URL: https://issues.apache.org/jira/browse/YARN-1307
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1307.1.patch, YARN-1307.2.patch, YARN-1307.3.patch


 Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, 
 YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in 
 YARN-1222:
 {quote}
 We should move to creating a node hierarchy for apps such that all znodes for 
 an app are stored under an app znode instead of the app root znode. This will 
 help in removeApplication and also in scaling better on ZK. The earlier code 
 was written this way to ensure create/delete happens under a root znode for 
 fencing. But given that we have moved to multi-operations globally, this isnt 
 required anymore.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-979) [YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

[
https://issues.apache.org/jira/browse/YARN-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813702#comment-13813702
]

Zhijie Shen commented on YARN-979:
--

I still have one question w.r.t. the annotations of the getter/setter of
GetRequest/Response. Some of them are marked as \@Stable, and some are
marked as \@Unstable. In addition, some setters are marked as \@Private, and
some are marked as \@Public. Do you have special consideration here? Maybe we
should mark all as \@Unstable for the initial AHS?

bq. I will create the jira for making applicationclientprotocol similar to
applicationHistoryProtocol

Thanks for file the ticket. Ideally, we'd like to have to paired
ApplicationClientProtocol and ApplicationHistoryProtocol. Then YarnClient can
implement to query running application/attempt/container from
ApplicationClientProtocol and the finished from ApplicationHistoryProtocol,
making it transparent to users.

[YARN-321] Add more APIs related to ApplicationAttempt and Container in
ApplicationHistoryProtocol
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete


[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813706#comment-13813706
 ] 

Hadoop QA commented on YARN-1279:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612118/YARN-1279.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2370//console

This message is automatically generated.

 Expose a client API to allow clients to figure if log aggregation is complete
 -

 Key: YARN-1279
 URL: https://issues.apache.org/jira/browse/YARN-1279
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Xuan Gong
 Attachments: YARN-1279.1.patch, YARN-1279.2.patch, YARN-1279.2.patch, 
 YARN-1279.3.patch, YARN-1279.3.patch, YARN-1279.4.patch, YARN-1279.4.patch, 
 YARN-1279.5.patch


 Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog APIs


 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1389:
--

Description: 
As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports 
of *finished* application attempts and containers, we should do the same for 
ApplicationClientProtocol, which will return the reports of *running* attempts 
and containers.

Later on, we can improve YarnClient to direct the query of running instance to 
ApplicationClientProtocol, while that of finished instance to 
ApplicationHistoryProtocol, making it transparent to the users.

  was:
At some point we need more infor in applicationClientProtocol which we have in 
ApplicationHistoryProtocol.
We need to merge those.

Summary: ApplicationClientProtocol and ApplicationHistoryProtocol 
should expose analog APIs  (was: Merging the ApplicationClientProtocol and 
ApplicationHistoryProtocol)

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose analog 
 APIs
 --

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

[
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813712#comment-13813712
]

Mayank Bansal commented on YARN-955:

Thanks [~zjshen] for the review

bq. 1. Is there any special reason to rename ASHService to
ApplicationHistoryClientService?
Its more verbose name and same as other classes.

bq. 2. Inner ApplicationHSClientProtocolHandler is not necessary.
ApplicationHistoryClientService can directly implement
ApplicationHistoryProtocol, which is what ASHService did before.

I used the same design pattern used in Job History server. And moreover its
more cleaner design then having service derived from everything. Secondly you
have multiple protocols implementation.

bq. 3. Incorrect log bellow:
Done.

bq. 4. We should use the newInstance method from the record class for
GetApplicationAttemptReportResponse and all the other records.
Done.

bq. 5. Some methods missed @Override, for example
They are not override methods, those are helper functions.

bq. 6. The two methods bellow is not implemented, but we can do it separately,
because we need to implement a DelegationTokenSecretManager first.
Those will be implemented once we implement security.

bq. 7. Did you miss ApplicationHistoryContext in the patch or is it included in
the patch of other Jira?
History Context is part of YARN-987.

bq. 8. Why the method bellow has the default access control?
Used in Test.

bq. 9. In RM and NM, we usually add a protected create() method for a sub
service, such that we can override it, and change to another implementation. It
is convenient when we want to mock some part of AHS when drafting the test
cases.
Done.

bq. 10. Shall we have the test cases for the ApplicationHistoryProtocol
implementation?
Done.

[YARN-321] Implementation of ApplicationHistoryProtocol
---

Key: YARN-955
URL: https://issues.apache.org/jira/browse/YARN-955
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
Attachments: YARN-955-1.patch, YARN-955-2.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-955:
---

Attachment: YARN-955-2.patch

Adding the latest patch.

Thanks,
Mayank

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813716#comment-13813716
 ] 

Bikas Saha commented on YARN-1222:
--

@Private? {code}+  public static String getConfValueForRMInstance(String 
prefix,{code}

If RM is the one creating root znode then how can someone else's ACL's be 
present on that znode? ie. how can the ACLs on root znode have any other 
entries?

My concern is that we are only adding new ACLs every time we failover but never 
deleting them. Is it possible that we end up creating too many ACLs for the 
root znode and hit ZK issues?
{code}
+Id rmId = new Id(zkRootNodeAuthScheme,
+DigestAuthenticationProvider.generateDigest(
+zkRootNodeUsername + : + zkRootNodePassword));
+zkRootNodeAcl.add(new ACL(CREATE_DELETE_PERMS, rmId));
+return zkRootNodeAcl;
{code}

For both of the above, can we use well-known prefixes for the root znode acls 
(rm-admin-acl and rm-cd-acl). When fencing we dont touch the rm-admin-acl but 
remove all rm-cd-acl's. We then add a new rm-cd-acl for ourselves. we dont 
touch any other acl. Where is the shared rm-admin-acl being set such that both 
RMs have admin access to the root znode?

How is the following case going to work? How can the root node acl be set in 
the conf? Upon active, we have to remove the old RM's cd-acl and set our 
cd-acl. That cannot be statically set in conf right?
{code}
if (HAUtil.isHAEnabled(conf)) {
+  String zkRootNodeAclConf = HAUtil.getConfValueForRMInstance
+  (YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL, conf);
+  if (zkRootNodeAclConf != null) {
+zkRootNodeAclConf = ZKUtil.resolveConfIndirection(zkRootNodeAclConf);
+try {
+  zkRootNodeAcl = ZKUtil.parseACLs(zkRootNodeAclConf);
+} catch (ZKUtil.BadAclFormatException bafe) {
+  LOG.error(Invalid format for  +
+  YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL);
+  throw bafe;
+}
+  }
{code}

The test should probably create separate copies of conf for the 2 RM's

Wont we get an exception/error from this? {code}+
rmService.submitApplication(SubmitApplicationRequest.newInstance(asc));
{code}
Lets put a comment saying, triggering a state store operation that makes rm1 
realize that its not the master because it got fenced by the store.

This and other similar places need an @Private {code}+  @VisibleForTesting
+  public void createWithRetries({code}

Can you please specify in comments which operations are exempt from 
multi-operation. Looks like only write operations go through multi. 
Exceptions being initial znode creation and fence-on-active. Right?

Can we move this logic into the common RMStateStore and notify it about HA 
state loss via a standard HA exception. Will the null return make the state 
store crash?
{code}
+} catch (KeeperException.NoAuthException nae) {
+  if (HAUtil.isHAEnabled(getConfig())) {
+// Transition to standby
+RMHAServiceTarget target = new RMHAServiceTarget(
+(YarnConfiguration)getConfig());
+target.getProxy(getConfig(), 1000).transitionToStandby(
+new HAServiceProtocol.StateChangeRequestInfo(
+HAServiceProtocol.RequestSource.REQUEST_BY_USER_FORCED));
+return null;
+  }
{code}


 Make improvements in ZKRMStateStore for fencing
 ---

 Key: YARN-1222
 URL: https://issues.apache.org/jira/browse/YARN-1222
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
 yarn-1222-4.patch


 Using multi-operations for every ZK interaction. 
 In every operation, automatically creating/deleting a lock znode that is the 
 child of the root znode. This is to achieve fencing by modifying the 
 create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-987) Adding History Service to use Store and converting Historydata to Report