[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013356#comment-14013356
 ] 

Jian He commented on YARN-1366:
---

The bulk of the patch here is MR changes. I think we should have a MR jira to 
track the MR changes? Both patches are very related and patch size seems 
reasonable to be consolidated. It's fine to leave as-is, but just easier for 
reviewer to have more context.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013325#comment-14013325
 ] 

Hadoop QA commented on YARN-2103:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647536/YARN-2103.v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3864//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3864//console

This message is automatically generated.

> Inconsistency between viaProto flag and initial value of 
> SerializedExceptionProto.Builder
> -
>
> Key: YARN-2103
> URL: https://issues.apache.org/jira/browse/YARN-2103
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch
>
>
> Bug 1:
> {code}
>   SerializedExceptionProto proto = SerializedExceptionProto
>   .getDefaultInstance();
>   SerializedExceptionProto.Builder builder = null;
>   boolean viaProto = false;
> {code}
> Since viaProto is false, we should initiate build rather than proto
> Bug 2:
> the class does not provide hashcode() and equals() like other PBImpl records, 
> this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-29 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2103:


Attachment: YARN-2103.v2.patch

updated wrong patch... resubmit

> Inconsistency between viaProto flag and initial value of 
> SerializedExceptionProto.Builder
> -
>
> Key: YARN-2103
> URL: https://issues.apache.org/jira/browse/YARN-2103
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch
>
>
> Bug 1:
> {code}
>   SerializedExceptionProto proto = SerializedExceptionProto
>   .getDefaultInstance();
>   SerializedExceptionProto.Builder builder = null;
>   boolean viaProto = false;
> {code}
> Since viaProto is false, we should initiate build rather than proto
> Bug 2:
> the class does not provide hashcode() and equals() like other PBImpl records, 
> this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-29 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2103:


Attachment: (was: YARN-2103.v1.patch)

> Inconsistency between viaProto flag and initial value of 
> SerializedExceptionProto.Builder
> -
>
> Key: YARN-2103
> URL: https://issues.apache.org/jira/browse/YARN-2103
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch
>
>
> Bug 1:
> {code}
>   SerializedExceptionProto proto = SerializedExceptionProto
>   .getDefaultInstance();
>   SerializedExceptionProto.Builder builder = null;
>   boolean viaProto = false;
> {code}
> Since viaProto is false, we should initiate build rather than proto
> Bug 2:
> the class does not provide hashcode() and equals() like other PBImpl records, 
> this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-29 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated YARN-2103:


Attachment: YARN-2103.v1.patch

Thanks for the review [~djp], update patch addressing your comments.

> Inconsistency between viaProto flag and initial value of 
> SerializedExceptionProto.Builder
> -
>
> Key: YARN-2103
> URL: https://issues.apache.org/jira/browse/YARN-2103
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2103.v1.patch, YARN-2103.v2.patch
>
>
> Bug 1:
> {code}
>   SerializedExceptionProto proto = SerializedExceptionProto
>   .getDefaultInstance();
>   SerializedExceptionProto.Builder builder = null;
>   boolean viaProto = false;
> {code}
> Since viaProto is false, we should initiate build rather than proto
> Bug 2:
> the class does not provide hashcode() and equals() like other PBImpl records, 
> this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-29 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1366:


Summary: AM should implement Resync with the ApplicationMasterService 
instead of shutting down  (was: AM should implement Resync with the 
ApplicationMaster instead of shutting down)

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-29 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013281#comment-14013281
 ] 

Anubhav Dhoot commented on YARN-1366:
-

Updated the title to more accurately reflect the changes covered in this jira 

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMaster instead of shutting down

2014-05-29 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1366:


Summary: AM should implement Resync with the ApplicationMaster instead of 
shutting down  (was: ApplicationMasterService should Resync with the AM upon 
allocate call after restart)

> AM should implement Resync with the ApplicationMaster instead of shutting down
> --
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-29 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013256#comment-14013256
 ] 

Rohith commented on YARN-1366:
--

Hi [~vinodkv], 
I agree that both issues title looks similar. But there is clear separation 
between both task that i.e Yarn Server and Yarn Client(MR also expected to 
change).
As per offline discussion with [~adhoot], he will be changing code at Yarn 
Server and I will change code at Yarn Client(MR also)

In terms of solution, basically
- YARN-1365: Handles yarn server i.e after RM restart ,completion of recovery 
process would expect 
 ** ApplicationMasterService service should be able to receive allocate and 
unregister requests from already running application master and register from 
newly launched applicationmaster without any failure.
** For running application master, RESYNC command is issued on allocate 
request.Unregister call should be able to succeed without any resync on RM 
restart.
** RM should be able to handle duplicate release request from AM.

- YARN-1366: Handles only Yarn client(expected to change MR also to get benifit 
of this feature) i.e upon RESYNC
** set last response id to 0 and register to RM again.
** send all outstanding request such as ask,release,blacklistnodes.


> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013236#comment-14013236
 ] 

Hadoop QA commented on YARN-2115:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647509/YARN-2115.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3863//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3863//console

This message is automatically generated.

> Replace RegisterNodeManagerRequest#ContainerStatus with a new 
> ContainerRecoveryReport
> -
>
> Key: YARN-2115
> URL: https://issues.apache.org/jira/browse/YARN-2115
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2115.1.patch
>
>
> This jira is protocol changes only to replace the ContainerStatus sent across 
> via NM register call with a new ContainerRecoveryReport to include all the 
> necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-05-29 Thread Subramaniam Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---

Attachment: YARN-1709.patch

The attached patch contains in-memory data structures to track reservations 
over time:

 * _Plan_ : It is the central data structure of a reservation system that 
maintains the "agenda" for the cluster i.e. how client reservations that have 
been accepted so far will be honoured.

 * _ReservationAllocation_ : It represents a concrete instance of resources 
allocated over time to satisfy a single client reservation request.

 * _RLESparseResourceAllocation_ It is a run length encoded sparse data 
structure that maintains cumulative resource allocations over time.

> Admission Control: Reservation subsystem
> 
>
> Key: YARN-1709
> URL: https://issues.apache.org/jira/browse/YARN-1709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subramaniam Krishnan
> Attachments: YARN-1709.patch
>
>
> This JIRA is about the key data structure used to track resources over time 
> to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of 
> how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013216#comment-14013216
 ] 

Hadoop QA commented on YARN-2091:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647505/YARN-2091.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3862//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3862//console

This message is automatically generated.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2115:
--

Attachment: YARN-2115.1.patch

Patch to replace the ContainerStatus with ContainerRecoveryReport

> Replace RegisterNodeManagerRequest#ContainerStatus with a new 
> ContainerRecoveryReport
> -
>
> Key: YARN-2115
> URL: https://issues.apache.org/jira/browse/YARN-2115
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2115.1.patch
>
>
> This jira is protocol changes only to replace the ContainerStatus sent across 
> via NM register call with a new ContainerRecoveryReport to include all the 
> necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013207#comment-14013207
 ] 

Vinod Kumar Vavilapalli commented on YARN-1365:
---

bq. Can we please consolidate YARN-1366 and YARN-1365 into one JIRA?
I was asked to elaborate offline.

The titles are
 - YARN-1365: ApplicationMasterService to allow Register and Unregister of an 
app that was running before restart
 - YARN-1366: ApplicationMasterService should Resync with the AM upon allocate 
call after restart

I haven't looked at the set of patches, but they seem like either they are two 
different solutions to the same problem or, if not, they will likely conflict a 
lot in terms of code changes.

> ApplicationMasterService to allow Register and Unregister of an app that was 
> running before restart
> ---
>
> Key: YARN-1365
> URL: https://issues.apache.org/jira/browse/YARN-1365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1365.001.patch, YARN-1365.002.patch, 
> YARN-1365.003.patch, YARN-1365.initial.patch
>
>
> For an application that was running before restart, the 
> ApplicationMasterService currently throws an exception when the app tries to 
> make the initial register or final unregister call. These should succeed and 
> the RMApp state machine should transition to completed like normal. 
> Unregistration should succeed for an app that the RM considers complete since 
> the RM may have died after saving completion in the store but before 
> notifying the AM that the AM is free to exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013206#comment-14013206
 ] 

Vinod Kumar Vavilapalli commented on YARN-1366:
---

bq. Can we please consolidate YARN-1366 and YARN-1365 into one JIRA?
I was asked to elaborate offline.

The titles are
 - YARN-1365: ApplicationMasterService to allow Register and Unregister of an 
app that was running before restart
 - YARN-1366: ApplicationMasterService should Resync with the AM upon allocate 
call after restart

I haven't looked at the set of patches, but they seem like either they are two 
different solutions to the same problem or, if not, they will likely conflict a 
lot in terms of code changes.

> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013201#comment-14013201
 ] 

Vinod Kumar Vavilapalli commented on YARN-2010:
---

bq. For completed applications before starting in secured mode, 
clientTokenMaterKey is null. After starting in secured mode, recovery of apps 
fails since clientTokenMasterKey is null. During recovering application, rm 
should have intellegence to decide whether recovering applicaiton has run in 
secured mode or non secured mode. This is possible by checking 
cilentTokenMasterKey for null.
bq. Please, can this be considered a "Blocker" as there seems no way to recover 
from this and still transition to secured mode?
Apologies for coming in real late. When is this exception/crash manifesting?

It seemed like this is when you try to upgrade from a non-secure cluster to a 
secure cluster. Is that so? That is a completely unsupportable use-case. There 
are so many other things that will be broken when you do such an upgrade with 
existing applications - think tokens needed, localized files etc.

Just trying to make sure we are not fixing 'issues' to support unsupportable 
use-cases.

> RM can't transition to active if it can't recover an app attempt
> 
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2010.1.patch, YARN-2010.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch
>
>
> If the RM fails to recover an app attempt, it won't come up. We should make 
> it more resilient.
> Specifically, the underlying error is that the app was submitted before 
> Kerberos security got turned on. Makes sense for the app to fail in this 
> case. But YARN should still start.
> {noformat}
> 2014-04-11 11:56:37,216 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election 
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to 
> Active 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
>  
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>  
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) 
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:274)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116)
>  
> ... 4 more 
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:811)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:842)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:265)
>  
> ... 5 more 
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.lang.IllegalArgumentException: Missing argument 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:372)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:273)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:406)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1000)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:462)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> ... 8 more 
> Caused by: java.lang.IllegalArgumentException: Missing argument 
> at javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:93) 
> at 
> org.apache.hadoop.security.token.SecretManager.createSecretKey(SecretManager.java:188)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ClientToAMTokenSecretManagerInR

[jira] [Updated] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2091:
-

Attachment: YARN-2091.4.patch

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch, 
> YARN-2091.4.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013178#comment-14013178
 ] 

Hadoop QA commented on YARN-1877:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647496/YARN-1877.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3861//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3861//console

This message is automatically generated.

> ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root 
> node auth
> ---
>
> Key: YARN-1877
> URL: https://issues.apache.org/jira/browse/YARN-1877
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-1877.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013171#comment-14013171
 ] 

Hadoop QA commented on YARN-1913:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647487/YARN-1913.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3860//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3860//console

This message is automatically generated.

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Wei Yan
> Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
> YARN-1913.patch, YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013150#comment-14013150
 ] 

Hadoop QA commented on YARN-2091:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647485/YARN-2091.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3859//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3859//console

This message is automatically generated.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth

2014-05-29 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013148#comment-14013148
 ] 

Robert Kanter commented on YARN-1877:
-

Discussed with Karthik offline.  There's no need to add an additional property 
for the auth for the root node as all auths are added the same way to the 
ZooKeeper client (i.e. it's not for a specific node; only the ACLs are) and we 
already have {{yarn.resourcemanager.zk-auth}} which can be used for passing in 
a list of auths.  

However, {{yarn.resourcemanager.zk-auth}} is not documented in 
yarn-default.xml, so we can repurpose this JIRA to document this property, 
including the fact that it accepts auths for both 
{{yarn.resourcemanager.zk-acl}} and 
{{yarn.resourcemanager.zk-state-store.root-node.acl}}.

> ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root 
> node auth
> ---
>
> Key: YARN-1877
> URL: https://issues.apache.org/jira/browse/YARN-1877
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-1877.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1877) ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root node auth

2014-05-29 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1877:


Attachment: YARN-1877.patch

> ZK store: Add yarn.resourcemanager.zk-state-store.root-node.auth for root 
> node auth
> ---
>
> Key: YARN-1877
> URL: https://issues.apache.org/jira/browse/YARN-1877
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: YARN-1877.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Attachment: YARN-1913.patch

Thanks, Sandy. 
Upload a new patch using the accurate way.

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Wei Yan
> Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
> YARN-1913.patch, YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs

2014-05-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1913:
--

Labels:   (was: easyfix)

> With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
> --
>
> Key: YARN-1913
> URL: https://issues.apache.org/jira/browse/YARN-1913
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Wei Yan
> Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, 
> YARN-1913.patch, YARN-1913.patch
>
>
> It's possible to deadlock a cluster by submitting many applications at once, 
> and have all cluster resources taken up by AMs.
> One solution is for the scheduler to limit resources taken up by AMs, as a 
> percentage of total cluster resources, via a "maxApplicationMasterShare" 
> config.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2091:
-

Attachment: YARN-2091.3.patch

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013119#comment-14013119
 ] 

Sandy Ryza commented on YARN-2091:
--

ContainerExitStatus should stay an int.  While ContainerStatus.getExitStatus is 
technically marked Unstable, I'm sure changing this would break some 
applications.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch, YARN-2091.3.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-29 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2115:
--

Issue Type: Improvement  (was: Bug)

> Replace RegisterNodeManagerRequest#ContainerStatus with a new 
> ContainerRecoveryReport
> -
>
> Key: YARN-2115
> URL: https://issues.apache.org/jira/browse/YARN-2115
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
>
> This jira is protocol changes only to replace the ContainerStatus sent across 
> via NM register call with a new ContainerRecoveryReport to include all the 
> necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2115) Replace RegisterNodeManagerRequest#ContainerStatus with a new ContainerRecoveryReport

2014-05-29 Thread Jian He (JIRA)
Jian He created YARN-2115:
-

 Summary: Replace RegisterNodeManagerRequest#ContainerStatus with a 
new ContainerRecoveryReport
 Key: YARN-2115
 URL: https://issues.apache.org/jira/browse/YARN-2115
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


This jira is protocol changes only to replace the ContainerStatus sent across 
via NM register call with a new ContainerRecoveryReport to include all the 
necessary information for container recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013102#comment-14013102
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Found that it's simple enough adding methods like isAMAware() to 
ContainerExitStatus:
{code}
public class ContainerExitStatus {

  public static boolean isAMAware(int exitReason) {
return (exitReason == DISKS_FAILED)
|| (exitReason == KILL_EXCEEDED_PMEM)
|| (exitReason == KILL_EXCEEDED_VMEM);
  }
}
{code}
I try to leave ContainerExitStatus just a class not enum for now. Please let me 
know if you have suggestions about this.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013093#comment-14013093
 ] 

Tsuyoshi OZAWA commented on YARN-2091:
--

Thanks for the suggestions, Bikas and Vinod. It would make sense too. I'll plan 
to separate pmem and vmem events and use kill reasons directly.

I have one discussion point: should we update ContainerExitStatus as Enum 
because EnumSets allow us to judge which events should be handled in AM easily? 
One concern is backward compatibility. This may be overkill. What do you think?

{code}
public enum ContainerExitStatus {
  ...
  KILL_EXCEEDED_VMEM(-103),
  KILL_EXCEEDED_PMEM(-104);

  public static EnumSet AM_EVENTS = EnumSet.of(
  DISKS_FAILED,
  PREEMPTED,
  KILL_EXCEEDED_VMEM,
  KILL_EXCEEDED_PMEM
  );
}
{code}

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013071#comment-14013071
 ] 

Bikas Saha commented on YARN-2091:
--

That would make sense if YARN would allow specifying pmem and vmem separately 
in resource request. Without that the information is not actionable.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013052#comment-14013052
 ] 

Vinod Kumar Vavilapalli commented on YARN-2091:
---

Should we differentiate killing due to exceeding pmem and vmem limits 
separately?

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013013#comment-14013013
 ] 

Sandy Ryza commented on YARN-1368:
--

Regardless of the technical approach taken in any initial patch, my 
understanding of basic JIRA practice is that assigning a JIRA to oneself (as 
Anubhav did here) means "give me some space to work on this".  Because Hadoop 
development is so distributed, it acts as a good coordination mechanism that 
helps us all avoid duplicating work.  Of course, we don't want people to be 
able to sit on JIRAs and stall progress, but a simple "Will you be able to get 
to this soon? If not, mind if I take it up?" and then waiting a couple of days 
usually suffices to deal with that.

Etiquette is obviously difficult to codify, and I realize that this discussion 
is detracting from the technical details of this JIRA, so I'll stop my 
belly-aching here.

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
> YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012980#comment-14012980
 ] 

Bikas Saha commented on YARN-2091:
--

Instead of having the following if-else code everywhere, can we simply always 
use the "reason" from the kill event. That way if we add a different reason 
tomorrow (exceeded disk quota) then we dont have to find all these special 
cases.
{code}+  if (killEvent.getReason() == 
ContainerExitStatus.KILL_EXCEEDED_MEMORY) {
+container.exitCode = killEvent.getReason();
+  } else {
+container.exitCode = ExitCode.TERMINATED.getExitCode();
+  }{code}
As an exercise, we can find other cases of ContainerKillEvent and see if new 
enums (like exceeded_memory) can be added. Or if a suitable default value can 
be found. Clearly, if we are killing then there should be a reason.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2091:
--

Target Version/s: 2.5.0

Targetting for 2.5.0. Let me look at it..

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012795#comment-14012795
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

bq. Unless I'm missing something, Anubhav was working on this JIRA. It is great 
that Jian did the refactoring to have common code for the schedulers and some 
testcases for it, but most of the work has been done by Anubhav and he was 
working actively on it. We should reassign the JIRA back to Anubhav and let him 
drive it to completion, agree?
[~tucu00], if you see the history of this ticket, there was nothing here that 
showed that Anubhav was working on this. There was a prototype patch on the 
umbrella JIRA, but the part of the patch that focuses on this JIRA was 
completely off the mark. It is specifically for these reasons that I publicly 
discouraged posting prototype patch for this well-defined feature. In this 
case, the prototype was off the mark, and folks(not just [~jianhe]) were 
confused about the individual tasks filed.

bq. Jian He, I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing to the assignee/author of the original patch the 
changes and offering to contribute/breakdown tasks. Please do so next time.
[~tucu00], I just looked the prototype as well as this patch. Not only is this 
patch taking a different approach, it is also *not* based on the work in the 
prototype. The approach in the prototype is pretty broken - everyone's free to 
look back on both the patches and argue about this - this one aims at 
correcting it by starting from scratch and moving beyond to set the right stage 
for the entire effort.

Even if it were, it is sometimes a major burden to correct original patches, 
specifically when their approaches are completely botched, and then direct them 
towards the right approach. This one's one of those.

I agree in general that he shouldn't have re-purposed this JIRA. This JIRA's 
original purpose was a very small subset of what it is doing now.

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
> YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2112) Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml

2014-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012764#comment-14012764
 ] 

Hudson commented on YARN-2112:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5623 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5623/])
YARN-2112. Fixed yarn-common's pom.xml to include jackson dependencies so that 
both Timeline Server and client can access them. Contributed by Zhijie Shen. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1598373)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/pom.xml


> Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml
> -
>
> Key: YARN-2112
> URL: https://issues.apache.org/jira/browse/YARN-2112
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.5.0
>
> Attachments: YARN-2112.1.patch
>
>
> Now YarnClient is using TimelineClient, which has dependency on jackson libs. 
> However, the current dependency configurations make the hadoop-client 
> artifect miss 2 jackson libs, such that the applications which have 
> hadoop-client dependency will see the following exception
> {code}
> java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.(TimelineClientImpl.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:44)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:149)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:94)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:88)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:111)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:394)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>   at org.apache.hadoop.util.ProgramDriver.run(Program

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012761#comment-14012761
 ] 

Hadoop QA commented on YARN-2091:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647416/YARN-2091.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3858//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3858//console

This message is automatically generated.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012750#comment-14012750
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

Started looking at this. It is too big a patch. Can you split this into pieces: 
one for RM-NM protocol stuff and one for the remaining?

> Common work to re-populate containers’ state into scheduler
> ---
>
> Key: YARN-1368
> URL: https://issues.apache.org/jira/browse/YARN-1368
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
> YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
> YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2112) Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012731#comment-14012731
 ] 

Vinod Kumar Vavilapalli commented on YARN-2112:
---

Looks good. Checking this in.

> Hadoop-client is missing jackson libs due to inappropriate configs in pom.xml
> -
>
> Key: YARN-2112
> URL: https://issues.apache.org/jira/browse/YARN-2112
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2112.1.patch
>
>
> Now YarnClient is using TimelineClient, which has dependency on jackson libs. 
> However, the current dependency configurations make the hadoop-client 
> artifect miss 2 jackson libs, such that the applications which have 
> hadoop-client dependency will see the following exception
> {code}
> java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.(TimelineClientImpl.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:44)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:149)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:94)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:88)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:111)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:394)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.lang.ClassNotFoundException: 
> org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider
> 

[jira] [Commented] (YARN-2054) Better defaults for YARN ZK configs for retries and retry-inteval when HA is enabled

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012710#comment-14012710
 ] 

Tsuyoshi OZAWA commented on YARN-2054:
--

+1(non-binding)

> Better defaults for YARN ZK configs for retries and retry-inteval when HA is 
> enabled
> 
>
> Key: YARN-2054
> URL: https://issues.apache.org/jira/browse/YARN-2054
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2054-1.patch, yarn-2054-2.patch, yarn-2054-3.patch, 
> yarn-2054-4.patch
>
>
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2054) Better defaults for YARN ZK configs for retries and retry-inteval when HA is enabled

2014-05-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2054:
---

Summary: Better defaults for YARN ZK configs for retries and retry-inteval 
when HA is enabled  (was: Poor defaults for YARN ZK configs for retries and 
retry-inteval)

> Better defaults for YARN ZK configs for retries and retry-inteval when HA is 
> enabled
> 
>
> Key: YARN-2054
> URL: https://issues.apache.org/jira/browse/YARN-2054
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2054-1.patch, yarn-2054-2.patch, yarn-2054-3.patch, 
> yarn-2054-4.patch
>
>
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2091:
-

Attachment: YARN-2091.2.patch

Fixed to pass test.

> Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
> ---
>
> Key: YARN-2091
> URL: https://issues.apache.org/jira/browse/YARN-2091
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Bikas Saha
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2091.1.patch, YARN-2091.2.patch
>
>
> Currently, the AM cannot programmatically determine if the task was killed 
> due to using excessive memory. The NM kills it without passing this 
> information in the container status back to the RM. So the AM cannot take any 
> action here. The jira tracks adding this exit status and passing it from the 
> NM to the RM and then the AM. In general, there may be other such actions 
> taken by YARN that are currently opaque to the AM. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2054) Poor defaults for YARN ZK configs for retries and retry-inteval

2014-05-29 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012674#comment-14012674
 ] 

Sandy Ryza commented on YARN-2054:
--

+1

> Poor defaults for YARN ZK configs for retries and retry-inteval
> ---
>
> Key: YARN-2054
> URL: https://issues.apache.org/jira/browse/YARN-2054
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2054-1.patch, yarn-2054-2.patch, yarn-2054-3.patch, 
> yarn-2054-4.patch
>
>
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012642#comment-14012642
 ] 

Hadoop QA commented on YARN-1474:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647393/YARN-1474.19.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 9 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3857//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3857//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3857//console

This message is automatically generated.

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.17.patch, YARN-1474.18.patch, YARN-1474.19.patch, 
> YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
> YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder

2014-05-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012572#comment-14012572
 ] 

Junping Du commented on YARN-2103:
--

Thanks [~decster] for the patch and [~ozawa] for review.
bq. How about adding concrete tests as a first step of generic tests on 
YARN-2051? What do you think, Junping Du?
That sounds good to me. We should add some concrete tests for some highly 
suspected PBImpls as the first step and refactor to/add more generic tests 
later. In this case, the original constructor (the one without parameter) is 
wrong as the object created with viaProto=false and builder to be null at the 
same time. We should add a unit test here. 

On code format, please try to follow our code convention in hadoop wiki - 
Sun/Oracle java programming convention (only difference is using 2 indentations 
instead of 4). In this convention, we have:
{code}
Note: if statements always use braces {}. Avoid the following error-prone form:
if (condition) //AVOID! THIS OMITS THE BRACES {}!
statement;
{code}
So may be we should try to be right for all new added code instead of staying 
the same with legacy code?

> Inconsistency between viaProto flag and initial value of 
> SerializedExceptionProto.Builder
> -
>
> Key: YARN-2103
> URL: https://issues.apache.org/jira/browse/YARN-2103
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: YARN-2103.v1.patch
>
>
> Bug 1:
> {code}
>   SerializedExceptionProto proto = SerializedExceptionProto
>   .getDefaultInstance();
>   SerializedExceptionProto.Builder builder = null;
>   boolean viaProto = false;
> {code}
> Since viaProto is false, we should initiate build rather than proto
> Bug 2:
> the class does not provide hashcode() and equals() like other PBImpl records, 
> this class is used in other records, it may affect other records' behavior. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1474) Make schedulers services

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1474:
-

Attachment: YARN-1474.19.patch

Rebased on trunk.

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.17.patch, YARN-1474.18.patch, YARN-1474.19.patch, 
> YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, 
> YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2114) Inform container of container-specific local directories

2014-05-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012544#comment-14012544
 ] 

Jason Lowe commented on YARN-2114:
--

Currently a container can obtain a list of local directories to use by 
examining the LOCAL_DIRS environment variable.  However these directories have 
an lifespan that matches the application (i.e.: they will only be deleted when 
the entire application completes).  Therefore if a container writes some 
temporary data to this directory and the container crashes or it otherwise 
orphans that data, the data won't be cleaned up when the container completes 
but rather only when the entire application completes.  There's use-cases for 
both: data that survives as long as the application is active and data that 
only survives as long as the container is active.

Given the way YARN works today, a container can take the list of directories 
from LOCAL_DIRS and tack on the CONTAINER_ID to find these directories.  
However that might not be forward compatible unless we commit to that always 
working.  It would be cleaner if there was a separate variable, maybe 
CONTAINER_LOCAL_DIRS, that listed the directories that container-specific 
rather than app-specific.

> Inform container of container-specific local directories
> 
>
> Key: YARN-2114
> URL: https://issues.apache.org/jira/browse/YARN-2114
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>
> It would be nice if a container could know which local directories it can use 
> for temporary data and those directories will be automatically cleaned up 
> when the container exits.  The current working directory is one of those 
> directories, but it's tricky (and potentially not forward-compatible) to 
> determine the other directories to use on a multi-disk node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012519#comment-14012519
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

Need to rebase on YARN-596. Please wait a moment.

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.17.patch, YARN-1474.18.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2114) Inform container of container-specific local directories

2014-05-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2114:


 Summary: Inform container of container-specific local directories
 Key: YARN-2114
 URL: https://issues.apache.org/jira/browse/YARN-2114
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.5.0
Reporter: Jason Lowe


It would be nice if a container could know which local directories it can use 
for temporary data and those directories will be automatically cleaned up when 
the container exits.  The current working directory is one of those 
directories, but it's tricky (and potentially not forward-compatible) to 
determine the other directories to use on a multi-disk node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012451#comment-14012451
 ] 

Junping Du commented on YARN-1338:
--

+1. Patch looks good to me. Thanks for this significant patch contribution, 
[~jlowe]! I will commit it shortly.

> Recover localized resource cache state upon nodemanager restart
> ---
>
> Key: YARN-1338
> URL: https://issues.apache.org/jira/browse/YARN-1338
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch, 
> YARN-1338v6.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1474) Make schedulers services

2014-05-29 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012444#comment-14012444
 ] 

Tsuyoshi OZAWA commented on YARN-1474:
--

IIUC, javadoc warning is not related to the latest patch. It doesn't include 
any changes about TestCompositeService. Please let me know if I'm wrong.

> Make schedulers services
> 
>
> Key: YARN-1474
> URL: https://issues.apache.org/jira/browse/YARN-1474
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Sandy Ryza
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-1474.1.patch, YARN-1474.10.patch, 
> YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, 
> YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, 
> YARN-1474.17.patch, YARN-1474.18.patch, YARN-1474.2.patch, YARN-1474.3.patch, 
> YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, 
> YARN-1474.8.patch, YARN-1474.9.patch
>
>
> Schedulers currently have a reinitialize but no start and stop.  Fitting them 
> into the YARN service model would make things more coherent.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-05-29 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012407#comment-14012407
 ] 

Remus Rusanu commented on YARN-1972:


Some responses in the mean-while, before I finish the design doc:

> What are the requirements on the NodeManager user?
It must be a member of local administrators group or LocalSystem. That means 
the equivalent of *nix 'root'. This is a requirement derived from the need to 
call 
[`LoadUserProfile()`](http://msdn.microsoft.com/en-us/library/windows/desktop/bb762281(v=vs.85).aspx)
 which documents that "the caller must be an administrator or the LocalSystem 
account. It is not sufficient for the caller to merely impersonate the 
administrator or LocalSystem account.". All in all a very high privilege 
required for NM. We are considering a future iteration in which we extract the 
privileged operations into a dedicated NT service (=daemon) and bestow the high 
privileges only to this service.

> You are launching so many commands for every container - to chown files, to 
> copy files etc.
We'll measure. the obvious problem, imho, is the many process spawns implied in 
chmod/chown/simlink, which are all implemented via winutils. I believe that 
these should be addresses by moving these operations into NativeIO and invoke 
them via JNI, avoiding the process creation cost (significant on Windows). I 
don't think that moving the localization into native code would result in much 
benefit over a proper Java implementation.

> Localizer already does createUserLocalDirs 
I didn't notice this. I've seen the DCE do this, I assumed it need to be done. 
As the Localizer would run as the task user, then letting the Localizer create 
this dirs removes the need to chown them after creation, they will be created 
'as needed' out-of-the-box. A double win :) 

> skips things like the setting niceness 
We sure can add niceness to WCE as well, the OS supports it. I opted not to as 
it can be added later as an incremental approach (trying to keep this patch 
manageable size).

>  Why cannot we simply use the localizerId?
I was getting duplicate errors because of task retries. For sure in my 
experiments (2.2 based) the localizerId was no unique enough.



> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Attachments: YARN-1972.1.patch
>
>
> This work item represents the Java side changes required to implement a 
> secure windows container executor, based on the YARN-1063 changes on 
> native/winutils side. 
> Necessary changes include leveraging the winutils task createas to launch the 
> container process as the required user and a secure localizer (launch 
> localization as a separate process running as the container user).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012376#comment-14012376
 ] 

Hudson commented on YARN-2107:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1758 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1758/])
YARN-2107. Refactored timeline classes into o.a.h.y.s.timeline package. 
Contributed by Vinod Kumar Vavilapalli. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1598094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/NameValuePair.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineWriter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineClientAuthenticationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp
* 
/hadoop/common/trunk/hadoop-yarn-project/had

[jira] [Commented] (YARN-596) Use scheduling policies throughout the queue hierarchy to decide which containers to preempt

2014-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012379#comment-14012379
 ] 

Hudson commented on YARN-596:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1758 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1758/])
YARN-596. Use scheduling policies throughout the queue hierarchy to decide 
which containers to preempt (Wei Yan via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1598197)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


> Use scheduling policies throughout the queue hierarchy to decide which 
> containers to preempt
> 
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Wei Yan
> Fix For: 2.5.0
>
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package

2014-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012279#comment-14012279
 ] 

Hudson commented on YARN-2107:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #567 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/567/])
YARN-2107. Refactored timeline classes into o.a.h.y.s.timeline package. 
Contributed by Vinod Kumar Vavilapalli. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1598094)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityIdentifier.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/GenericObjectMapper.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/NameValuePair.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineReader.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineWriter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/package-info.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineClientAuthenticationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineDelegationTokenSecretManagerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoo

[jira] [Commented] (YARN-596) Use scheduling policies throughout the queue hierarchy to decide which containers to preempt

2014-05-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012282#comment-14012282
 ] 

Hudson commented on YARN-596:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #567 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/567/])
YARN-596. Use scheduling policies throughout the queue hierarchy to decide 
which containers to preempt (Wei Yan via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1598197)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


> Use scheduling policies throughout the queue hierarchy to decide which 
> containers to preempt
> 
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Wei Yan
> Fix For: 2.5.0
>
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

2014-05-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012235#comment-14012235
 ] 

Junping Du commented on YARN-1338:
--

bq. Good point. I added shutdown code that removes the recovery directory if 
the shutdown is due to a decommission. I also added a unit test for this 
scenario.
Thanks for addressing my comments, Jason!
bq. The last component of localDir is the unique resource ID and not a 
directory managed by the local cache directory manager.
I see. It is really confusing and we'd better put some documents somewhere 
(don't have to be in this patch though given this is big enough).
I will review it again today.

> Recover localized resource cache state upon nodemanager restart
> ---
>
> Key: YARN-1338
> URL: https://issues.apache.org/jira/browse/YARN-1338
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch, 
> YARN-1338v6.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012223#comment-14012223
 ] 

Hadoop QA commented on YARN-2083:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647328/YARN-2083.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3856//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3856//console

This message is automatically generated.

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
>  Labels: assignContainer, fair, scheduler
> Fix For: 2.3.0
>
> Attachments: YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2098) App priority support in Fair Scheduler

2014-05-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012196#comment-14012196
 ] 

Hadoop QA commented on YARN-2098:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647316/YARN-2098.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3855//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3855//console

This message is automatically generated.

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.5.0
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1,we should
> change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-29 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: (was: YARN-2083.patch)

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
>  Labels: assignContainer, fair, scheduler
> Fix For: 2.3.0
>
> Attachments: YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-05-29 Thread Yi Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Tian updated YARN-2083:
--

Attachment: YARN-2083.patch

> In fair scheduler, Queue should not been assigned more containers when its 
> usedResource had reach the maxResource limit
> ---
>
> Key: YARN-2083
> URL: https://issues.apache.org/jira/browse/YARN-2083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.3.0
>Reporter: Yi Tian
>  Labels: assignContainer, fair, scheduler
> Fix For: 2.3.0
>
> Attachments: YARN-2083.patch
>
>
> In fair scheduler, FSParentQueue and FSLeafQueue do an 
> assignContainerPreCheck to guaranty this queue is not over its limit.
> But the fitsIn function in Resource.java did not return false when the 
> usedResource equals the maxResource.
> I think we should create a new Function "fitsInWithoutEqual" instead of 
> "fitsIn" in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-596) Use scheduling policies throughout the queue hierarchy to decide which containers to preempt

2014-05-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-596:


Assignee: Wei Yan  (was: Sandy Ryza)

> Use scheduling policies throughout the queue hierarchy to decide which 
> containers to preempt
> 
>
> Key: YARN-596
> URL: https://issues.apache.org/jira/browse/YARN-596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Wei Yan
> Fix For: 2.5.0
>
> Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, 
> YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch
>
>
> In the fair scheduler, containers are chosen for preemption in the following 
> way:
> All containers for all apps that are in queues that are over their fair share 
> are put in a list.
> The list is sorted in order of the priority that the container was requested 
> in.
> This means that an application can shield itself from preemption by 
> requesting it's containers at higher priorities, which doesn't really make 
> sense.
> Also, an application that is not over its fair share, but that is in a queue 
> that is over it's fair share is just as likely to have containers preempted 
> as an application that is over its fair share.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2014-05-29 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2098:
--

Attachment: YARN-2098.patch

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.5.0
>Reporter: Ashwin Shankar
>Assignee: Wei Yan
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1,we should
> change this to get priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian JIRA
(v6.2#6252)