[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2347:
-

Attachment: YARN-2347-v2.patch

The findbug issue is not related to this patch. However, fix it in v2 patch.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072946#comment-14072946
 ] 

Hadoop QA commented on YARN-2347:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657562/YARN-2347-v2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4411//console

This message is automatically generated.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2172) Suspend/Resume Hadoop Jobs

2014-07-24 Thread Richard Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Chen updated YARN-2172:
---

Attachment: Hadoop Job Suspend Resume Design.docx

Design Document for Hadoop Job Suspend/Resume Implementation

 Suspend/Resume Hadoop Jobs
 --

 Key: YARN-2172
 URL: https://issues.apache.org/jira/browse/YARN-2172
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager, webapp
Affects Versions: 2.2.0
 Environment: CentOS 6.5, Hadoop 2.2.0
Reporter: Richard Chen
  Labels: hadoop, jobs, resume, suspend
 Fix For: 2.2.0

 Attachments: Hadoop Job Suspend Resume Design.docx

   Original Estimate: 336h
  Remaining Estimate: 336h

 In a multi-application cluster environment, jobs running inside Hadoop YARN 
 may be of lower-priority than jobs running outside Hadoop YARN like HBase. To 
 give way to other higher-priority jobs inside Hadoop, a user or some 
 cluster-level resource scheduling service should be able to suspend and/or 
 resume some particular jobs within Hadoop YARN.
 When target jobs inside Hadoop are suspended, those already allocated and 
 running task containers will continue to run until their completion or active 
 preemption by other ways. But no more new containers would be allocated to 
 the target jobs. In contrast, when suspended jobs are put into resume mode, 
 they will continue to run from the previous job progress and have new task 
 containers allocated to complete the rest of the jobs.
 My team has completed its implementation and our tests showed it works in a 
 rather solid way. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-1063:
---

Attachment: (was: YARN-1063.5.patch)

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment for the new logon.
 # Launch the new process in a job with the task name specified and using the 
 created logon.
 # Wait for the JOB to exit.
 h2. Future work:
 The following work was scoped out of this check in:
 * Support for non-domain users or machine that are not domain joined.
 * Support for privilege isolation by running the task launcher in a high 
 privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-1063:
---

Attachment: YARN-1063.5.patch

I have reloaded patch .5. The previous upload had a whitespace diff that 
prevented apply to trunk. I had fixed my local branch to remove the ws only 
diffs from trunk and re-created patch .5. 

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment for the new logon.
 # Launch the new process in a job with the task name specified and using the 
 created logon.
 # Wait for the JOB to exit.
 h2. Future work:
 The following work was scoped out of this check in:
 * Support for non-domain users or machine that are not domain joined.
 * Support for privilege isolation by running the task launcher in a high 
 privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-24 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072986#comment-14072986
 ] 

Akira AJISAKA commented on YARN-2336:
-

The patch looks mostly good. I built a pseudo-distributed cluster and verified 
the JSON response.
Some minor comments:
{code}
FSLeafQueue leaf1 = queueManager.getLeafQueue(root.q.subqueue1, true);
FSLeafQueue leaf2 = queueManager.getLeafQueue(root.q.subqueue2, true);
{code}
In the test, the above code is only to create LeafQueue and leaf1 and leaf2 are 
unused, so I think it's better to comment that as follows:
{code}
// create LeafQueue
queueManager.getLeafQueue(root.q.subqueue1, true);
queueManager.getLeafQueue(root.q.subqueue2, true);
{code}

{code}
  public void testClusterSchedulerWithSubQueues() throws JSONException, 
Exception {
{code}
Would you render the line within 80 characters?
In addition, would you please remove unused import from 
FairSchedulerQueueInfo.java?

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API

2014-07-24 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072990#comment-14072990
 ] 

Akira AJISAKA commented on YARN-1050:
-

[~kj-ki], thank you for filing a JIRA and creating a patch for the issue.
Committers, please review YARN-2336 first. The patch needs to be updated after 
YARN-2336 is committed.

 Document the Fair Scheduler REST API
 

 Key: YARN-1050
 URL: https://issues.apache.org/jira/browse/YARN-1050
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
 Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch


 The documentation should be placed here along with the Capacity Scheduler 
 documentation: 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073008#comment-14073008
 ] 

Hadoop QA commented on YARN-1063:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657572/YARN-1063.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1259 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ipc.TestIPC

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4412//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4412//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4412//console

This message is automatically generated.

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - 

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073030#comment-14073030
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

[~sseth], thanks for your comment.

[~jianhe], [~zjshen], after reading the comment by Zhijie, I think [first 
design|https://issues.apache.org/jira/browse/YARN-2229] looks better because of 
cluster-level backward compatibility. Can we agree with going on the first 
design?

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps

2014-07-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073031#comment-14073031
 ] 

Tsuyoshi OZAWA commented on YARN-2313:
--

[~kkambatl], thank you for your suggestion. It sounds reasonable and good to 
me. I'll open new JIRA to address maintenance thread.

 Livelock can occur in FairScheduler when there are lots of running apps
 ---

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 YARN-2313.4.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-1063:
---

Attachment: YARN-1063.6.patch

patch .6 fixes the extra warning. the IPC test failure I believe is infra 
related, not patch related.

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment for the new logon.
 # Launch the new process in a job with the task name specified and using the 
 created logon.
 # Wait for the JOB to exit.
 h2. Future work:
 The following work was scoped out of this check in:
 * Support for non-domain users or machine that are not domain joined.
 * Support for privilege isolation by running the task launcher in a high 
 privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073094#comment-14073094
 ] 

Hudson commented on YARN-1342:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/622/])
YARN-1342. Recover container tokens upon nodemanager restart. Contributed by 
Jason Lowe. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, 
 YARN-1342v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073100#comment-14073100
 ] 

Hudson commented on YARN-2300:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/622/])
YARN-2300. Improved the documentation of the sample requests for RM REST API - 
submitting an app. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Document better sample requests for RM web services for submitting apps
 ---

 Key: YARN-2300
 URL: https://issues.apache.org/jira/browse/YARN-2300
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-2300.0.patch


 The documentation for RM web services should provide better examples for app 
 submission.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073098#comment-14073098
 ] 

Hudson commented on YARN-2147:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #622 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/622/])
YARN-2147. client lacks delegation token exception details when application 
submit fails. Contributed by Chen He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2347:
-

Attachment: YARN-2347-v3.patch

Sync patch with latest trunk.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart

2014-07-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073115#comment-14073115
 ] 

Junping Du commented on YARN-1354:
--

Just like [~devaraj.k]'s above comments, [~jlowe], would you like to sync the 
patch to latest trunk given many related patches get committed recently? Thx!

 Recover applications upon nodemanager restart
 -

 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1354-v1.patch, 
 YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch


 The set of active applications in the nodemanager context need to be 
 recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073123#comment-14073123
 ] 

Hadoop QA commented on YARN-1063:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657587/YARN-1063.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.ipc.TestIPC

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4413//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4413//console

This message is automatically generated.

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to 

[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-07-24 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073143#comment-14073143
 ] 

Remus Rusanu commented on YARN-1063:


TestIPC.testRetryProxy passes for me locally with the patch applied. The test 
does not exercise in any way the winutils.

 Winutils needs ability to create task as domain user
 

 Key: YARN-1063
 URL: https://issues.apache.org/jira/browse/YARN-1063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
 Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
 YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch


 h1. Summary:
 Securing a Hadoop cluster requires constructing some form of security 
 boundary around the processes executed in YARN containers. Isolation based on 
 Windows user isolation seems most feasible. This approach is similar to the 
 approach taken by the existing LinuxContainerExecutor. The current patch to 
 winutils.exe adds the ability to create a process as a domain user. 
 h1. Alternative Methods considered:
 h2. Process rights limited by security token restriction:
 On Windows access decisions are made by examining the security token of a 
 process. It is possible to spawn a process with a restricted security token. 
 Any of the rights granted by SIDs of the default token may be restricted. It 
 is possible to see this in action by examining the security tone of a 
 sandboxed process launch be a web browser. Typically the launched process 
 will have a fully restricted token and need to access machine resources 
 through a dedicated broker process that enforces a custom security policy. 
 This broker process mechanism would break compatibility with the typical 
 Hadoop container process. The Container process must be able to utilize 
 standard function calls for disk and network IO. I performed some work 
 looking at ways to ACL the local files to the specific launched without 
 granting rights to other processes launched on the same machine but found 
 this to be an overly complex solution. 
 h2. Relying on APP containers:
 Recent versions of windows have the ability to launch processes within an 
 isolated container. Application containers are supported for execution of 
 WinRT based executables. This method was ruled out due to the lack of 
 official support for standard windows APIs. At some point in the future 
 windows may support functionality similar to BSD jails or Linux containers, 
 at that point support for containers should be added.
 h1. Create As User Feature Description:
 h2. Usage:
 A new sub command was added to the set of task commands. Here is the syntax:
 winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
 Some notes:
 * The username specified is in the format of user@domain
 * The machine executing this command must be joined to the domain of the user 
 specified
 * The domain controller must allow the account executing the command access 
 to the user information. For this join the account to the predefined group 
 labeled Pre-Windows 2000 Compatible Access
 * The account running the command must have several rights on the local 
 machine. These can be managed manually using secpol.msc: 
 ** Act as part of the operating system - SE_TCB_NAME
 ** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
 ** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
 * The launched process will not have rights to the desktop so will not be 
 able to display any information or create UI.
 * The launched process will have no network credentials. Any access of 
 network resources that requires domain authentication will fail.
 h2. Implementation:
 Winutils performs the following steps:
 # Enable the required privileges for the current process.
 # Register as a trusted process with the Local Security Authority (LSA).
 # Create a new logon for the user passed on the command line.
 # Load/Create a profile on the local machine for the new logon.
 # Create a new environment for the new logon.
 # Launch the new process in a job with the task name specified and using the 
 created logon.
 # Wait for the JOB to exit.
 h2. Future work:
 The following work was scoped out of this check in:
 * Support for non-domain users or machine that are not domain joined.
 * Support for privilege isolation by running the task launcher in a high 
 privilege service with access over an ACLed named pipe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2349) InvalidStateTransitonException after RM switch

2014-07-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073158#comment-14073158
 ] 

Rohith commented on YARN-2349:
--

This is basically configurations in capacity-scheduler.xml of both RM's does 
not match. During recovery application is moved  New-ACCEPTED synchronously by 
adding application to scheduler. Before scheduler knows about 
appilcation,RMAppImpl is moved to ACCEPTED. Any exception(for serveral reason) 
during submitApplication,APP_REJECTED event is triggered which inturn cause 
InvaliStateTransition.
For fixing it, either enfource both RM's configuration should be same adding 
note OR handle APP_REJECTED event at ACCEPTED state.

 InvalidStateTransitonException after RM switch
 --

 Key: YARN-2349
 URL: https://issues.apache.org/jira/browse/YARN-2349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty

 {code}
 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 45018: starting
 2014-07-23 19:22:28,266 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_REJECTED at ACCEPTED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690)
  at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@10.18.40.84:45020
 2014-07-23 19:22:28,291 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Error when openning history file of application 
 application_1406116264351_0007
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073161#comment-14073161
 ] 

Hadoop QA commented on YARN-2347:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657589/YARN-2347-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4414//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4414//console

This message is automatically generated.

 Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
 yarn-server-common
 

 Key: YARN-2347
 URL: https://issues.apache.org/jira/browse/YARN-2347
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, YARN-2347.patch


 We have similar things for version state for RM, NM, TS (TimelineServer), 
 etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073209#comment-14073209
 ] 

Hudson commented on YARN-2147:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/])
YARN-2147. client lacks delegation token exception details when application 
submit fails. Contributed by Chen He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073211#comment-14073211
 ] 

Hudson commented on YARN-2300:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/])
YARN-2300. Improved the documentation of the sample requests for RM REST API - 
submitting an app. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Document better sample requests for RM web services for submitting apps
 ---

 Key: YARN-2300
 URL: https://issues.apache.org/jira/browse/YARN-2300
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-2300.0.patch


 The documentation for RM web services should provide better examples for app 
 submission.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073205#comment-14073205
 ] 

Hudson commented on YARN-1342:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1814 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1814/])
YARN-1342. Recover container tokens upon nodemanager restart. Contributed by 
Jason Lowe. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, 
 YARN-1342v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception

2014-07-24 Thread Ted Yu (JIRA)
Ted Yu created YARN-2350:


 Summary: TestApplicationMasterServiceOnHA fails with InvalidToken 
exception
 Key: YARN-2350
 URL: https://issues.apache.org/jira/browse/YARN-2350
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu


From https://builds.apache.org/job/Hadoop-Yarn-trunk/622 :
{code}
Running org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.591 sec  
FAILURE! - in org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
testAllocateOnHA(org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA)
  Time elapsed: 8.408 sec   ERROR!
org.apache.hadoop.security.token.SecretManager$InvalidToken: Given AMRMToken 
for application : appattempt_1000_0001_00 seems to have been generated 
illegally.
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy85.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy86.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA.testAllocateOnHA(TestApplicationMasterServiceOnHA.java:84)
{code}
This is reproducible locally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073218#comment-14073218
 ] 

Allen Wittenauer commented on YARN-2348:


-1 as written.

Properly set up servers typically have their time set to UTC.  Changing the 
display here will conflict with what is in the log files.  If you want to 
display a different locale on the Web UI, then it needs to be selectable.

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073218#comment-14073218
 ] 

Allen Wittenauer edited comment on YARN-2348 at 7/24/14 2:07 PM:
-

-1

Properly set up servers typically have their time set to UTC.  Changing the 
display here will conflict with what is in the log files.  If you want to 
display a different locale on the Web UI, then it needs to be selectable.


was (Author: aw):
-1 as written.

Properly set up servers typically have their time set to UTC.  Changing the 
display here will conflict with what is in the log files.  If you want to 
display a different locale on the Web UI, then it needs to be selectable.

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073263#comment-14073263
 ] 

Hudson commented on YARN-1342:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/])
YARN-1342. Recover container tokens upon nodemanager restart. Contributed by 
Jason Lowe. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612995)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/BaseContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/security/NMTokenSecretManagerInNM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMContainerTokenSecretManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/security/TestNMTokenSecretManagerInNM.java


 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.6.0

 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch, 
 YARN-1342v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2300) Document better sample requests for RM web services for submitting apps

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073269#comment-14073269
 ] 

Hudson commented on YARN-2300:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/])
YARN-2300. Improved the documentation of the sample requests for RM REST API - 
submitting an app. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612981)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Document better sample requests for RM web services for submitting apps
 ---

 Key: YARN-2300
 URL: https://issues.apache.org/jira/browse/YARN-2300
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-2300.0.patch


 The documentation for RM web services should provide better examples for app 
 submission.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073267#comment-14073267
 ] 

Hudson commented on YARN-2147:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1841 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1841/])
YARN-2147. client lacks delegation token exception details when application 
submit fails. Contributed by Chen He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612950)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2351) YARN CLI should provide a command to list the configurations in use

2014-07-24 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2351:
-

 Summary: YARN CLI should provide a command to list the 
configurations in use
 Key: YARN-2351
 URL: https://issues.apache.org/jira/browse/YARN-2351
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen


To more easily understand the expected behavior of a yarn component, it is good 
have the command line to be able to print the configurations in use for RM, NM 
and timeline server daemons, as what we can do now via the web interfaces:

{code}
http://RM|NM|Timeline host:port/conf
{code}

The command line could be something like:

{code}
yarn conf resourcemanager|nodemanager|timelineserver [host]
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use

2014-07-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073324#comment-14073324
 ] 

Allen Wittenauer commented on YARN-2351:


hdfs already has getconf, so this should be an analog and/or expansion of that 
command for consistency.

 YARN CLI should provide a command to list the configurations in use
 ---

 Key: YARN-2351
 URL: https://issues.apache.org/jira/browse/YARN-2351
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen

 To more easily understand the expected behavior of a yarn component, it is 
 good have the command line to be able to print the configurations in use for 
 RM, NM and timeline server daemons, as what we can do now via the web 
 interfaces:
 {code}
 http://RM|NM|Timeline host:port/conf
 {code}
 The command line could be something like:
 {code}
 yarn conf resourcemanager|nodemanager|timelineserver [host]
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073328#comment-14073328
 ] 

Alejandro Abdelnur commented on YARN-2348:
--

Allen suggestion of making selectable from the browser makes sense. 

In Oozie, we are doing this. Because JavaScript does not have built in 
libraries for TZ handling, what we did is:

* have request parameter that specifies the desired TZ for datetime values, 
default value is UTC.
* TZ conversion happens on the server side when producing the JSON output using 
the TZ request parameter.
* have a REST call that returns the list of available TZ.
* have a dropdown in the UI that shows the available TZs (uses the rest call 
from previous bullet)
* use a cookie to remember the user selected TZ
* if the cookie is present, set the TZ request parameter with it.



 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use

2014-07-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073334#comment-14073334
 ] 

Zhijie Shen commented on YARN-2351:
---

Noticed that. Agree that we can do the similar thing for YARN

 YARN CLI should provide a command to list the configurations in use
 ---

 Key: YARN-2351
 URL: https://issues.apache.org/jira/browse/YARN-2351
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen

 To more easily understand the expected behavior of a yarn component, it is 
 good have the command line to be able to print the configurations in use for 
 RM, NM and timeline server daemons, as what we can do now via the web 
 interfaces:
 {code}
 http://RM|NM|Timeline host:port/conf
 {code}
 The command line could be something like:
 {code}
 yarn conf resourcemanager|nodemanager|timelineserver [host]
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2351) YARN CLI should provide a command to list the configurations in use

2014-07-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073348#comment-14073348
 ] 

Allen Wittenauer commented on YARN-2351:


The big thing is consistency... so you're getting yarn getconf as the 
subcommand. :)

 YARN CLI should provide a command to list the configurations in use
 ---

 Key: YARN-2351
 URL: https://issues.apache.org/jira/browse/YARN-2351
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.6.0
Reporter: Zhijie Shen

 To more easily understand the expected behavior of a yarn component, it is 
 good have the command line to be able to print the configurations in use for 
 RM, NM and timeline server daemons, as what we can do now via the web 
 interfaces:
 {code}
 http://RM|NM|Timeline host:port/conf
 {code}
 The command line could be something like:
 {code}
 yarn conf resourcemanager|nodemanager|timelineserver [host]
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps

2014-07-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073455#comment-14073455
 ] 

Karthik Kambatla commented on YARN-2313:


[~ozawa] - thanks. I have started looking at it and we can do it on YARN-2328. 
I hope you haven't also started working on it. 

 Livelock can occur in FairScheduler when there are lots of running apps
 ---

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 YARN-2313.4.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-07-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073467#comment-14073467
 ] 

Karthik Kambatla commented on YARN-2352:


Once this is done, we can look into getting statistics over a sliding window - 
either in the number of calls or time. 

 FairScheduler: Collect metrics on duration of critical methods that affect 
 performance
 --

 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 We need more metrics for better visibility into FairScheduler performance. At 
 the least, we need to do this for (1) handle node events, (2) update, (3) 
 compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2352) FairScheduler: Collect metrics on duration of critical methods that affect performance

2014-07-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2352:
--

 Summary: FairScheduler: Collect metrics on duration of critical 
methods that affect performance
 Key: YARN-2352
 URL: https://issues.apache.org/jira/browse/YARN-2352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


We need more metrics for better visibility into FairScheduler performance. At 
the least, we need to do this for (1) handle node events, (2) update, (3) 
compute fairshares, (4) preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2353) FairScheduler: Update demand asynchronously instead of in the Update Thread

2014-07-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2353:
--

 Summary: FairScheduler: Update demand asynchronously instead of in 
the Update Thread
 Key: YARN-2353
 URL: https://issues.apache.org/jira/browse/YARN-2353
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-24 Thread chang li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li reassigned YARN-2308:
--

Assignee: chang li

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073550#comment-14073550
 ] 

Jian He commented on YARN-2211:
---

- This code not needed? may remove the newInstance() method also
{code}
AMRMTokenSecretManagerState amrmTokenSecretManagerState =
AMRMTokenSecretManagerState.newInstance();
{code}
- currentKey will never be null ? if so, we can remove the check.
{code} 
if (currentKey != null) {
this.currentMasterKey =
new MasterKeyData(currentKey, createSecretKey(currentKey.getBytes()
  .array()));
  }
 if (currentMasterKey != null 
{code}
- Instead of moving the following to yarn_proto, we should probably have a 
separate jira to move all the RM recovery related records to resource manager 
module. For now, I think we can create a new proto file and move amrm token 
state there.{code}
message MasterKeyProto {
  optional int32 key_id = 1;
  optional bytes bytes = 2;
}{code}

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts

2014-07-24 Thread Jian He (JIRA)
Jian He created YARN-2354:
-

 Summary: DistributedShell may allocate more containers than client 
specified after it restarts
 Key: YARN-2354
 URL: https://issues.apache.org/jira/browse/YARN-2354
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He


To reproduce, run distributed shell with -num_containers option,
In ApplicationMaster.java, the following code has some issue.
{code}
  int numTotalContainersToRequest =
numTotalContainers - previousAMRunningContainers.size();
for (int i = 0; i  numTotalContainersToRequest; ++i) {
  ContainerRequest containerAsk = setupContainerAskForRM();
  amRMClient.addContainerRequest(containerAsk);
}
numRequestedContainers.set(numTotalContainersToRequest);
{code}
 numRequestedContainers doesn't account for previous AM's requested containers. 
so numRequestedContainers should be set to numTotalContainers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-24 Thread chang li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073574#comment-14073574
 ] 

chang li commented on YARN-2308:


I am working on it

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073577#comment-14073577
 ] 

Xuan Gong commented on YARN-2211:
-

bq. This code not needed? may remove the newInstance() method also

It is used in RMStateStore initiation. 

bq. currentKey will never be null ? if so, we can remove the check.

Might need to keep the NULL check. If the RM is start from the brand new state, 
there are no states at all. So, the currentKey is NULL.

bq. Instead of moving the following to yarn_proto, we should probably have a 
separate jira to move all the RM recovery related records to resource manager 
module. For now, I think we can create a new proto file and move amrm token 
state there.

We can do that. 

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1336) Work-preserving nodemanager restart

2014-07-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1336:
-

Attachment: YARN-1336-rollup-v2.patch

Refreshing the rollup patch to latest trunk so it's easier for people to play 
with the feature and get a general sense of things before the rest of the 
patches are integrated.   Notable fixes since the last rollup patch include 
fixing container reacquisition and avoiding deleting log directories on NM 
teardown when we're restarting.

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, 
 YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped

2014-07-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073687#comment-14073687
 ] 

Karthik Kambatla commented on YARN-2328:


As per the discussion on YARN-2313, it might be better to have a single thread 
run all background tasks so that we guarantee the foreground tasks have a 
dedicated chunk of time with no thread holding a lock on the FairScheduler 
which is the smallest of these tasks' periods. 

 FairScheduler: Verify update and continuous scheduling threads are stopped 
 when the scheduler is stopped
 

 Key: YARN-2328
 URL: https://issues.apache.org/jira/browse/YARN-2328
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch


 FairScheduler threads can use a little cleanup and tests. To begin with, the 
 update and continuous-scheduling threads should extend Thread and handle 
 being interrupted. We should have tests for starting and stopping them as 
 well. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073727#comment-14073727
 ] 

Mayank Bansal commented on YARN-2069:
-

Thanks [~vinodkv] for the review.

I have changed the patch based on the targated capacity for the queue. It 
balances out with the users resources.
I also removed the twp passes and now its only one pass.

Please review it.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-6.patch

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1354) Recover applications upon nodemanager restart

2014-07-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1354:
-

Attachment: YARN-1354-v4.patch

Thanks for the interest, Devaraj and Junping!  I updated patch to trunk.

 Recover applications upon nodemanager restart
 -

 Key: YARN-1354
 URL: https://issues.apache.org/jira/browse/YARN-1354
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1354-v1.patch, 
 YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
 YARN-1354-v4.patch


 The set of active applications in the nodemanager context need to be 
 recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts

2014-07-24 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-2354:
---

Assignee: Li Lu

 DistributedShell may allocate more containers than client specified after it 
 restarts
 -

 Key: YARN-2354
 URL: https://issues.apache.org/jira/browse/YARN-2354
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Li Lu

 To reproduce, run distributed shell with -num_containers option,
 In ApplicationMaster.java, the following code has some issue.
 {code}
   int numTotalContainersToRequest =
 numTotalContainers - previousAMRunningContainers.size();
 for (int i = 0; i  numTotalContainersToRequest; ++i) {
   ContainerRequest containerAsk = setupContainerAskForRM();
   amRMClient.addContainerRequest(containerAsk);
 }
 numRequestedContainers.set(numTotalContainersToRequest);
 {code}
  numRequestedContainers doesn't account for previous AM's requested 
 containers. so numRequestedContainers should be set to numTotalContainers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-24 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:


Attachment: YARN-415.201407242148.txt

[~leftnoteasy]

Thank you very much for helping me review this.

{quote}
1) RMAppAttemptImpl.java
 1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you 
please revert them? Like
{code}
   RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition())
-  
+
{code}
{quote}
Changes completed.

{quote}
1.2 getResourceUtilization:
{code}
+if (rmApps != null) {
+  RMApp app = rmApps.get(attemptId.getApplicationId());
+  if (app != null) {
{code}
I think the two cannot happen, we don't need check null to avoid potential bug 
here
{quote}
Changes completed.

{quote}
{code}
+  ApplicationResourceUsageReport appResUsageRpt =
{code}

It's better to name it appResUsageReport since rpt is not a common abbr of 
report.
{quote}
Changes completed.

{quote}
2) RMContainerImpl.java
 2.1 updateAttemptMetrics:

{code}
  if (rmApps != null) {
RMApp rmApp = 
rmApps.get(container.getApplicationAttemptId().getApplicationId());
if (rmApp != null) {
{code}

Again, I think the two null check is unnecessary
{quote}
I was able to remove the rmApps variable, but I had to leave the check for 
{{app != null}} because if I try to take that out, several unit tests would 
fail with NullPointerException. Even with removing the rmApps variable, I 
needed to change TestRMContainerImpl.java to mock rmContext.getRMApps().

{quote}
3) SchedulerApplicationAttempt.java
 3.1 Some rename suggestions: (Please let me know if you have better idea)
 CACHE_MILLI - MEMORY_UTILIZATION_CACHE_MILLISECONDS
 lastTime - lastMemoryUtilizationUpdateTime
 cachedMemorySeconds - lastMemorySeconds
 same for cachedVCore ...
{quote}
Changes complete.

{quote}
4) AppBlock.java
 Should we rename Resource Seconds: to Resource Utilization or something?
{quote}
I changed it as you suggested. It feels like there should be something that 
would describe it better, but I can't think of anything right now.

{quote}
5) Test
 5.1 I'm wondering if we need add a end to end test, since we changed 
RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt.
 It can consist submit an application, launch several containers, and finish 
application. And it's better to make the launched application contains several 
application attempt.
 While the application running, there're muliple containers running, and 
multiple containers finished. We can check if total resource utilization are 
expected.
{quote}
I'm still working on the unit tests as you suggested, but I wanted to get the 
rest of the patch up first so you can look at it :-)

{quote}
bq. One thing I did notice when these values are cached is that there is a race 
where containers can get counted twice:

I think this can not be avoid, it should be a transient state and Jian He and I 
discussed about this long time before.
 But apparently, 3 sec cache make it not only a transient state. I suggest you 
can make lastTime in SchedulerApplicationAttempt protected. And in 
FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in 
completedContainer method). You can set lastTime to a negtive value like -1, 
and next time when trying to get accumulated resource utilization, it will 
recompute all container utilization.
{quote}
I made the changes to {{lastTime}} as you suggested. I agree that there will 
always be a possibility that the container will still be in the 
{{liveContainers}} list for a very brief period after the container has 
finished. With the cached values the way I had them before, this gap was 
noticeable in the resource calculations. Your suggested changes brought the 
race back down even for the cached values.


 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an 

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-7.patch

Updated patch

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped

2014-07-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2328:
---

Attachment: yarn-2328-preview.patch

Here is a preview patch that introduces FSBackgroundTasksRunner. As of now, 
only UpdateThread and ContinuousSchedulingThread are added here. We might want 
to move the AllocationFileLoaderService also here. 

Appreciate any initial feedback on the approach or the patch. 

 FairScheduler: Verify update and continuous scheduling threads are stopped 
 when the scheduler is stopped
 

 Key: YARN-2328
 URL: https://issues.apache.org/jira/browse/YARN-2328
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2328-1.patch, yarn-2328-2.patch, yarn-2328-2.patch, 
 yarn-2328-preview.patch


 FairScheduler threads can use a little cleanup and tests. To begin with, the 
 update and continuous-scheduling threads should extend Thread and handle 
 being interrupted. We should have tests for starting and stopping them as 
 well. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.7.patch

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073791#comment-14073791
 ] 

Xuan Gong commented on YARN-2211:
-

new patch addressed all latest comments

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073874#comment-14073874
 ] 

Hadoop QA commented on YARN-2211:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657706/YARN-2211.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4419//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4419//console

This message is automatically generated.

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2211:


Attachment: YARN-2211.7.1.patch

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.1.patch, YARN-2211.7.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073896#comment-14073896
 ] 

Hadoop QA commented on YARN-2335:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657215/YARN-2335-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4420//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4420//console

This message is automatically generated.

 Annotate all hadoop-sls APIs as @Private
 

 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2335-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073899#comment-14073899
 ] 

Wangda Tan commented on YARN-2308:
--

[~lichangleo], thanks for working on it!
Looking forward your patch.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-24 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2209:
--

Attachment: YARN-2209.2.patch

Patch rebased on trunk

 Replace allocate#resync command with ApplicationMasterNotRegisteredException 
 to indicate AM to re-register on RM restart
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073918#comment-14073918
 ] 

Hadoop QA commented on YARN-2209:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657737/YARN-2209.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4422//console

This message is automatically generated.

 Replace allocate#resync command with ApplicationMasterNotRegisteredException 
 to indicate AM to re-register on RM restart
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073921#comment-14073921
 ] 

Karthik Kambatla commented on YARN-2214:


Patch looks mostly good. One nit: can we move the check in 
FSLeafQueue#preemptContainer before the debug logging. 

 preemptContainerPreCheck() in FSParentQueue delays convergence towards 
 fairness
 ---

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2214-v1.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-24 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2209:
--

Attachment: YARN-2209.3.patch

 Replace allocate#resync command with ApplicationMasterNotRegisteredException 
 to indicate AM to re-register on RM restart
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073940#comment-14073940
 ] 

Hadoop QA commented on YARN-2211:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657728/YARN-2211.7.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4421//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4421//console

This message is automatically generated.

 RMStateStore needs to save AMRMToken master key for recovery when RM 
 restart/failover happens 
 --

 Key: YARN-2211
 URL: https://issues.apache.org/jira/browse/YARN-2211
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, 
 YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch, 
 YARN-2211.6.1.patch, YARN-2211.6.patch, YARN-2211.7.1.patch, YARN-2211.7.patch


 After YARN-2208, AMRMToken can be rolled over periodically. We need to save 
 related Master Keys and use them to recover the AMRMToken when RM 
 restart/failover happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2338) service assemble so complex

2014-07-24 Thread dingjiaqi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingjiaqi updated YARN-2338:


Priority: Critical  (was: Major)

 service assemble so complex
 ---

 Key: YARN-2338
 URL: https://issues.apache.org/jira/browse/YARN-2338
 Project: Hadoop YARN
  Issue Type: Wish
Reporter: tangjunjie
Priority: Critical

   See ResourceManager
 protected void serviceInit(Configuration configuration) throws Exception 
 So many service will assembe into resourcemanager.
 Use guice or other service assemble framework to refactor this complex code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-24 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-2336:
--

Attachment: YARN-2336-3.patch

Thanks for review, [~ajisakaa]. Updated for comments.
- Removed unused variables(leaf1/leaf2)
- Adjusted under 80 characters
- Removed unused import from FairSchedulerQueueInfo.java

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073967#comment-14073967
 ] 

Hadoop QA commented on YARN-1994:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657741/YARN-1994.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4423//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4423//console

This message is automatically generated.

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, 
 YARN-1994.7.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2349) InvalidStateTransitonException after RM switch

2014-07-24 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2349:


Assignee: Rohith

 InvalidStateTransitonException after RM switch
 --

 Key: YARN-2349
 URL: https://issues.apache.org/jira/browse/YARN-2349
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Assignee: Rohith

 {code}
 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server 
 Responder: starting
 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 45018: starting
 2014-07-23 19:22:28,266 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_REJECTED at ACCEPTED
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690)
  at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@10.18.40.84:45020
 2014-07-23 19:22:28,291 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  Error when openning history file of application 
 application_1406116264351_0007
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception

2014-07-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073973#comment-14073973
 ] 

Rohith commented on YARN-2350:
--

This issue is because of YARN-2208 check in. As a wholse solution for the task 
AMRToken roll over, YARN-2211 also to be checked in which is in progress. I 
verified by applying patch YARN-2211, RM HA woks fine to me.

 TestApplicationMasterServiceOnHA fails with InvalidToken exception
 --

 Key: YARN-2350
 URL: https://issues.apache.org/jira/browse/YARN-2350
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/622 :
 {code}
 Running org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.591 sec  
 FAILURE! - in org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
 testAllocateOnHA(org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA)
   Time elapsed: 8.408 sec   ERROR!
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given AMRMToken 
 for application : appattempt_1000_0001_00 seems to have been generated 
 illegally.
 at org.apache.hadoop.ipc.Client.call(Client.java:1411)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy85.allocate(Unknown Source)
 at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
 at com.sun.proxy.$Proxy86.allocate(Unknown Source)
 at 
 org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA.testAllocateOnHA(TestApplicationMasterServiceOnHA.java:84)
 {code}
 This is reproducible locally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073981#comment-14073981
 ] 

Zhijie Shen commented on YARN-2247:
---

+1 except some nits:

1. I meant RM has the same problem, and we need to do null check
{code}
+if (testMiniKDC != null) {
+  testMiniKDC.stop();
+}
+rm.stop();
{code}

2. YarnAuthenticationFilter(Initializer) - RMAuthenticationFilter(Initializer)

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, 
 apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073999#comment-14073999
 ] 

Hadoop QA commented on YARN-2209:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657746/YARN-2209.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
  
org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4424//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4424//console

This message is automatically generated.

 Replace allocate#resync command with ApplicationMasterNotRegisteredException 
 to indicate AM to re-register on RM restart
 

 Key: YARN-2209
 URL: https://issues.apache.org/jira/browse/YARN-2209
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch


 YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
 application to re-register on RM restart. we should do the same for 
 AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2355) MAX_APP_ATTEMPTS_ENV may no longer be a useful env var for a container

2014-07-24 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2355:
-

 Summary: MAX_APP_ATTEMPTS_ENV may no longer be a useful env var 
for a container
 Key: YARN-2355
 URL: https://issues.apache.org/jira/browse/YARN-2355
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


After YARN-2074, YARN-614 and YARN-611, the application cannot judge whether it 
has the chance to try based on MAX_APP_ATTEMPTS_ENV alone. We should be able to 
notify the application of the up-to-date remaining retry quota.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1336) Work-preserving nodemanager restart

2014-07-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074010#comment-14074010
 ] 

Junping Du commented on YARN-1336:
--

Good work, [~jlowe]! Thanks for sharing.

 Work-preserving nodemanager restart
 ---

 Key: YARN-1336
 URL: https://issues.apache.org/jira/browse/YARN-1336
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, 
 YARN-1336-rollup.patch


 This serves as an umbrella ticket for tasks related to work-preserving 
 nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps

2014-07-24 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074018#comment-14074018
 ] 

Tsuyoshi OZAWA commented on YARN-2313:
--

Great. I'll check it.

 Livelock can occur in FairScheduler when there are lots of running apps
 ---

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 YARN-2313.4.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074033#comment-14074033
 ] 

Hadoop QA commented on YARN-2336:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657750/YARN-2336-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4425//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4425//console

This message is automatically generated.

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-24 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2214:
-

Attachment: YARN-2214-v2.txt

 preemptContainerPreCheck() in FSParentQueue delays convergence towards 
 fairness
 ---

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-24 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074039#comment-14074039
 ] 

Ashwin Shankar commented on YARN-2214:
--

Thanks [~kasha] ! Patch refreshed.

 preemptContainerPreCheck() in FSParentQueue delays convergence towards 
 fairness
 ---

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-24 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2247:


Attachment: apache-yarn-2247.5.patch

{quote}
1. I meant RM has the same problem, and we need to do null check
{noformat}
+if (testMiniKDC != null) {
+  testMiniKDC.stop();
+}
+rm.stop();
{noformat}
{quote}

Got it. Fixed.

{quote}
2. YarnAuthenticationFilter(Initializer) - RMAuthenticationFilter(Initializer)
{quote}

Fixed.

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, 
 apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch, 
 apache-yarn-2247.5.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074063#comment-14074063
 ] 

Jian He commented on YARN-2229:
---

bq. ContainerTokenIdentifier serializes a long (getContainerId()) at RM side, 
but deserializes a int (getId()) at NM side. In this case, I'm afraid it's 
going to be wrong.
ContainerToken compatibility can not be ensured until rolling upgrades is 
completed as mentioned here 
https://issues.apache.org/jira/browse/YARN-2152?focusedCommentId=14061366page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061366

Earlier the id integer returned by getId is supposed to be a monotonically 
increasing integer and some application logic may depend on that (e.g. sort 
containers based on the id integer),  the problem with the approach of adding a 
new field is that multiple containers may have the same id integer after RM 
restarts. Thoughts?

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2214) preemptContainerPreCheck() in FSParentQueue delays convergence towards fairness

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074073#comment-14074073
 ] 

Hadoop QA commented on YARN-2214:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657768/YARN-2214-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4426//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4426//console

This message is automatically generated.

 preemptContainerPreCheck() in FSParentQueue delays convergence towards 
 fairness
 ---

 Key: YARN-2214
 URL: https://issues.apache.org/jira/browse/YARN-2214
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2214-v1.txt, YARN-2214-v2.txt


 preemptContainerPreCheck() in FSParentQueue rejects preemption requests if 
 the parent queue is below fair share. This can cause a delay in converging 
 towards fairness when the starved leaf queue and the queue above fairshare 
 belong under a non-root parent queue(ie their least common ancestor is a 
 parent queue which is not root).
 Here is an example :
 root.parent has fair share = 80% and usage = 80%
 root.parent.child1 has fair share =40% usage = 80%
 root.parent.child2 has fair share=40% usage=0%
 Now a job is submitted to child2 and the demand is 40%.
 Preemption will kick in and try to reclaim all the 40% from child1.
 When it preempts the first container from child1,the usage of root.parent 
 will become 80%, which is less than root.parent's fair share,causing 
 preemption to stop.So only one container gets preempted in this round 
 although the need is a lot more. child2 would eventually get to half its fair 
 share but only after multiple rounds of preemption.
 Solution is to remove preemptContainerPreCheck() in FSParentQueue and keep it 
 only in FSLeafQueue(which is already there).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074082#comment-14074082
 ] 

Hadoop QA commented on YARN-2247:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12657770/apache-yarn-2247.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4427//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4427//console

This message is automatically generated.

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, 
 apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, apache-yarn-2247.4.patch, 
 apache-yarn-2247.5.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074094#comment-14074094
 ] 

Wangda Tan commented on YARN-415:
-

Hi Eric,
Thanks for updating your patch again,

*To your comments,*
bq. I was able to remove the rmApps variable, but I had to leave the check for 
app != null because if I try to take that out, several unit tests would fail 
with NullPointerException. Even with removing the rmApps variable, I needed to 
change TestRMContainerImpl.java to mock rmContext.getRMApps().
I would like to suggest to fix such UTs instead of inserting some kernel code 
to make UT pass. I'm not sure about the effort of doing this, if the effort is 
still reasonable, we should do it.

bq. I'm still working on the unit tests as you suggested, but I wanted to get 
the rest of the patch up first so you can look at it 
No problem :), I can give some reviews about your existing changes.

*I've reviewed some details of your patch, a very minor comments,*
ApplicationCLI.java
{code}
+  appReportStr.print(\tResources used : );
{code}
We need change it to Resource Utilization as well?

I think other the patch almost LGTM, looking forward your new patch contains an 
integration test.

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2298) Move TimelineClient to yarn-common

2014-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074104#comment-14074104
 ] 

Jian He commented on YARN-2298:
---

Maybe it's better to add dependency on yarn-client module instead of moving the 
code, as the client libraries are splitted  into separate packages. similar 
problem happens if RM wants to use NMClient to launch AM.

 Move TimelineClient to yarn-common
 --

 Key: YARN-2298
 URL: https://issues.apache.org/jira/browse/YARN-2298
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2298.1.patch


 To allow RM to reuse the timeline client code, we have to move it out of 
 yarn-client module, due to maven dependency issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)