date:20140911

[
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129755#comment-14129755
]

Remus Rusanu commented on YARN-1063:

I have tested against today trunk and the patch is identical, no need for any
rebase on this.

Winutils needs ability to create task as domain user

Key: YARN-1063
URL: https://issues.apache.org/jira/browse/YARN-1063
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Environment: Windows
Reporter: Kyle Leckie
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch,
YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch

h1. Summary:
Securing a Hadoop cluster requires constructing some form of security
boundary around the processes executed in YARN containers. Isolation based on
Windows user isolation seems most feasible. This approach is similar to the
approach taken by the existing LinuxContainerExecutor. The current patch to
winutils.exe adds the ability to create a process as a domain user.
h1. Alternative Methods considered:
h2. Process rights limited by security token restriction:
On Windows access decisions are made by examining the security token of a
process. It is possible to spawn a process with a restricted security token.
Any of the rights granted by SIDs of the default token may be restricted. It
is possible to see this in action by examining the security tone of a
sandboxed process launch be a web browser. Typically the launched process
will have a fully restricted token and need to access machine resources
through a dedicated broker process that enforces a custom security policy.
This broker process mechanism would break compatibility with the typical
Hadoop container process. The Container process must be able to utilize
standard function calls for disk and network IO. I performed some work
looking at ways to ACL the local files to the specific launched without
granting rights to other processes launched on the same machine but found
this to be an overly complex solution.
h2. Relying on APP containers:
Recent versions of windows have the ability to launch processes within an
isolated container. Application containers are supported for execution of
WinRT based executables. This method was ruled out due to the lack of
official support for standard windows APIs. At some point in the future
windows may support functionality similar to BSD jails or Linux containers,
at that point support for containers should be added.
h1. Create As User Feature Description:
h2. Usage:
A new sub command was added to the set of task commands. Here is the syntax:
winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
Some notes:
* The username specified is in the format of user@domain
* The machine executing this command must be joined to the domain of the user
specified
* The domain controller must allow the account executing the command access
to the user information. For this join the account to the predefined group
labeled Pre-Windows 2000 Compatible Access
* The account running the command must have several rights on the local
machine. These can be managed manually using secpol.msc:
** Act as part of the operating system - SE_TCB_NAME
** Replace a process-level token - SE_ASSIGNPRIMARYTOKEN_NAME
** Adjust memory quotas for a process - SE_INCREASE_QUOTA_NAME
* The launched process will not have rights to the desktop so will not be
able to display any information or create UI.
* The launched process will have no network credentials. Any access of
network resources that requires domain authentication will fail.
h2. Implementation:
Winutils performs the following steps:
# Enable the required privileges for the current process.
# Register as a trusted process with the Local Security Authority (LSA).
# Create a new logon for the user passed on the command line.
# Load/Create a profile on the local machine for the new logon.
# Create a new environment for the new logon.
# Launch the new process in a job with the task name specified and using the
created logon.
# Wait for the JOB to exit.
h2. Future work:
The following work was scoped out of this check in:
* Support for non-domain users or machine that are not domain joined.
* Support for privilege isolation by running the task launcher in a high
privilege service with access over an ACLed named pipe.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

[
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129761#comment-14129761
]

Hadoop QA commented on YARN-611:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12667941/YARN-611.9.rebase.patch
against trunk revision 4be9517.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4884//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4884//console

This message is automatically generated.

Add an AM retry count reset window to YARN RM
-

Key: YARN-611
URL: https://issues.apache.org/jira/browse/YARN-611
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Chris Riccomini
Assignee: Xuan Gong
Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch,
YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch,
YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch,
YARN-611.9.rebase.patch

YARN currently has the following config:
yarn.resourcemanager.am.max-retries
This config defaults to 2, and defines how many times to retry a failed AM
before failing the whole YARN job. YARN counts an AM as failed if the node
that it was running on dies (the NM will timeout, which counts as a failure
for the AM), or if the AM dies.
This configuration is insufficient for long running (or infinitely running)
YARN jobs, since the machine (or NM) that the AM is running on will
eventually need to be restarted (or the machine/NM will fail). In such an
event, the AM has not done anything wrong, but this is counted as a failure
by the RM. Since the retry count for the AM is never reset, eventually, at
some point, the number of machine/NM failures will result in the AM failure
count going above the configured value for
yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the
job as failed, and shut it down. This behavior is not ideal.
I propose that we add a second configuration:
yarn.resourcemanager.am.retry-count-window-ms
This configuration would define a window of time that would define when an AM
is well behaved, and it's safe to reset its failure count back to zero.
Every time an AM fails the RmAppImpl would check the last time that the AM
failed. If the last failure was less than retry-count-window-ms ago, and the
new failure count is max-retries, then the job should fail. If the AM has
never failed, the retry count is max-retries, or if the last failure was
OUTSIDE the retry-count-window-ms, then the job should be restarted.
Additionally, if the last failure was outside the retry-count-window-ms, then
the failure count should be set back to 0.
This would give developers a way to have well-behaved AMs run forever, while
still failing mis-behaving AMs after a short period of time.
I think the work to be done here is to change the RmAppImpl to actually look
at app.attempts, and see if there have been more than max-retries failures in
the last retry-count-window-ms milliseconds. If there have, then the job
should fail, if not, then the job should go forward. Additionally, we might
also need to add an endTime in either RMAppAttemptImpl or
RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of

[jira] [Commented] (YARN-1912) ResourceLocalizer started without any jvm memory control

2014-09-11 Thread Guo Ruijing (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129799#comment-14129799
 ] 

Guo Ruijing commented on YARN-1912:
---

We may add configurable parameter for options in yarn-default.xml as:

property
descriptionWhen nodemanager tries to localize the resources for linux 
containers, it will use
the following JVM opts to launch the linux container localizer. /description
nameyarn.nodemanager.linux-container-localizer.opts/name
value-Xmx1024m/value
/property

 ResourceLocalizer started without any jvm memory control
 

 Key: YARN-1912
 URL: https://issues.apache.org/jira/browse/YARN-1912
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: stanley shi
 Attachments: YARN-1912-0.patch, YARN-1912-1.patch


 In the LinuxContainerExecutor.java#startLocalizer, it does not specify any 
 -Xmx configurations in the command, this caused the ResourceLocalizer to be 
 started with default memory setting.
 In an server-level hardware, it will use 25% of the system memory as the max 
 heap size, this will cause memory issue in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

[
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated YARN-1972:
---
Attachment: YARN-1972.delta.4.patch

Patch delta.4 is delta from YARN-1063, rebased to today trunk

Implement secure Windows Container Executor
---

Key: YARN-1972
URL: https://issues.apache.org/jira/browse/YARN-1972
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch,
YARN-1972.delta.4.patch

h1. Windows Secure Container Executor (WCE)
YARN-1063 adds the necessary infrasturcture to launch a process as a domain
user as a solution for the problem of having a security boundary between
processes executed in YARN containers and the Hadoop services. The WCE is a
container executor that leverages the winutils capabilities introduced in
YARN-1063 and launches containers as an OS process running as the job
submitter user. A description of the S4U infrastructure used by YARN-1063
alternatives considered can be read on that JIRA.
The WCE is based on the DefaultContainerExecutor. It relies on the DCE to
drive the flow of execution, but it overwrrides some emthods to the effect of:
* change the DCE created user cache directories to be owned by the job user
and by the nodemanager group.
* changes the actual container run command to use the 'createAsUser' command
of winutils task instead of 'create'
* runs the localization as standalone process instead of an in-process Java
method call. This in turn relies on the winutil createAsUser feature to run
the localization as the job user.

When compared to LinuxContainerExecutor (LCE), the WCE has some minor
differences:
* it does no delegate the creation of the user cache directories to the
native implementation.
* it does no require special handling to be able to delete user files
The approach on the WCE came from a practical trial-and-error approach. I had
to iron out some issues around the Windows script shell limitations (command
line length) to get it to work, the biggest issue being the huge CLASSPATH
that is commonplace in Hadoop environment container executions. The job
container itself is already dealing with this via a so called 'classpath
jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch
as a separate container the same issue had to be resolved and I used the same
'classpath jar' approach.
h2. Deployment Requirements
To use the WCE one needs to set the
`yarn.nodemanager.container-executor.class` to
`org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor`
and set the `yarn.nodemanager.windows-secure-container-executor.group` to a
Windows security group name that is the nodemanager service principal is a
member of (equivalent of LCE
`yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE
does not require any configuration outside of the Hadoop own's yar-site.xml.
For WCE to work the nodemanager must run as a service principal that is
member of the local Administrators group or LocalSystem. this is derived from
the need to invoke LoadUserProfile API which mention these requirements in
the specifications. This is in addition to the SE_TCB privilege mentioned in
YARN-1063, but this requirement will automatically imply that the SE_TCB
privilege is held by the nodemanager. For the Linux speakers in the audience,
the requirement is basically to run NM as root.
h2. Dedicated high privilege Service
Due to the high privilege required by the WCE we had discussed the need to
isolate the high privilege operations into a separate process, an 'executor'
service that is solely responsible to start the containers (incloding the
localizer). The NM would have to authenticate, authorize and communicate with
this service via an IPC mechanism and use this service to launch the
containers. I still believe we'll end up deploying such a service, but the
effort to onboard such a new platfrom specific new service on the project are
not trivial.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

[
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated YARN-1972:
---
Attachment: YARN-1972.trunk.4.patch

patch trunk.4 is diff from trunk, for Jenkins to evaluate

Implement secure Windows Container Executor
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1912) ResourceLocalizer started without any jvm memory control


[ 
https://issues.apache.org/jira/browse/YARN-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129812#comment-14129812
 ] 

Hadoop QA commented on YARN-1912:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642478/YARN-1912-1.patch
  against trunk revision 4be9517.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4885//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4885//console

This message is automatically generated.

 ResourceLocalizer started without any jvm memory control
 

 Key: YARN-1912
 URL: https://issues.apache.org/jira/browse/YARN-1912
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: stanley shi
 Attachments: YARN-1912-0.patch, YARN-1912-1.patch


 In the LinuxContainerExecutor.java#startLocalizer, it does not specify any 
 -Xmx configurations in the command, this caused the ResourceLocalizer to be 
 started with default memory setting.
 In an server-level hardware, it will use 25% of the system memory as the max 
 heap size, this will cause memory issue in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

[
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129837#comment-14129837
]

Wangda Tan commented on YARN-2496:
--

Hi [~jianhe],
Thanks for your comments,
bq. CSQueueUtils.java format change only, we can revert
Reverted

bq. why checking labelManager != null every where ? we only need to check where
it’s needed.
It was used to reduce changes in tests. I think we should remove these checks
and improve tests

bq. We may not need to change the method signature to add one more parameter,
just pass the queues map into NodeLabelManager#reinitializeQueueLabels, to
avoid a number of test changes.
Make sense, now reverted changes for related tests and get a queueToLabels
after parseQueue

bq. label initialization code is duplicated between ParentQueue and LeafQueue,
how about creating an AbstractCSQueue and put common initialization methods
there ?
Make sense, there do have lots of common code between PQ and LQ, now I have
merged all common parts to abstractCSQueue.

Attached a new patch

Thanks,
Wangda

[YARN-796] Changes for capacity scheduler to support allocate resource
respect labels
-

Key: YARN-2496
URL: https://issues.apache.org/jira/browse/YARN-2496
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-2496.patch

This JIRA Includes:
- Add/parse labels option to {{capacity-scheduler.xml}} similar to other
options of queue like capacity/maximum-capacity, etc.
- Include a default-label-expression option in queue config, if an app
doesn't specify label-expression, default-label-expression of queue will be
used.
- Check if labels can be accessed by the queue when submit an app with
labels-expression to queue or update ResourceRequest with label-expression
- Check labels on NM when trying to allocate ResourceRequest on the NM with
label-expression
- Respect labels when calculate headroom/user-limit

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129845#comment-14129845
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667992/YARN-2496.patch
  against trunk revision 4be9517.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4887//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

[
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129854#comment-14129854
]

Hadoop QA commented on YARN-1972:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12667974/YARN-1972.trunk.4.patch
against trunk revision 4be9517.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.ha.TestZKFailoverControllerStress

org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/4886//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4886//console

This message is automatically generated.

Implement secure Windows Container Executor
---

[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels


 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch, YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels


[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129875#comment-14129875
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668003/YARN-2496.patch
  against trunk revision 4be9517.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4888//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels


[ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129877#comment-14129877
 ] 

Hadoop QA commented on YARN-2500:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668004/YARN-2500.patch
  against trunk revision 4be9517.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4889//console

This message is automatically generated.

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch, YARN-2500.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129904#comment-14129904
 ] 

Hudson commented on YARN-1458:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-1458. FairScheduler: Zero weight can lead to livelock. (Zhihai Xu via 
kasha) (kasha: rev 3072c83b38fd87318d502a7d1bc518963b5ccdf7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: Zero weight can lead to livelock
 ---

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129908#comment-14129908
 ] 

Hudson commented on YARN-415:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-415. Capture aggregate memory allocation at the app-level for chargeback. 
Contributed by Eric Payne  Andrey Klochkov (jianhe: rev 
83be3ad44484bf8a24cb90de4b9c26ab59d226a8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
*

[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129911#comment-14129911
 ] 

Hudson commented on YARN-2448:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-2448. Changed ApplicationMasterProtocol to expose RM-recognized resource 
types to the AMs. Contributed by Varun Vasudev. (vinodkv: rev 
b67d5ba7842cc10695d987f217027848a5a8c3d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/RegisterApplicationMasterResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java


 RM should expose the resource types considered during scheduling when AMs 
 register
 --

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, 
 apache-yarn-2448.2.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129907#comment-14129907
 ] 

Hudson commented on YARN-2158:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-2158. Fixed TestRMWebServicesAppsModification#testSingleAppKill test 
failure. Contributed by Varun Vasudev (jianhe: rev 
cbfe26370b85161c79fdd48bf69c95d5725d8f6a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* hadoop-yarn-project/CHANGES.txt


 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.6.0

 Attachments: apache-yarn-2158.0.patch, apache-yarn-2158.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129906#comment-14129906
 ] 

Hudson commented on YARN-2440:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-2440. Enabled Nodemanagers to limit the aggregate cpu usage across all 
containers to a preconfigured limit. Contributed by Varun Vasudev. (vinodkv: 
rev 4be95175cdb58ff12a27ab443d609d3b46da7bfa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java


 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, apache-yarn-2440.6.patch, 
 screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129910#comment-14129910
 ] 

Hudson commented on YARN-2459:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #677 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/677/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits

2014-09-11 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2531:

Attachment: apache-yarn-2531.0.patch

Uploaded patch with added config.

 CGroups - Admins should be allowed to enforce strict cpu limits
 ---

 Key: YARN-2531
 URL: https://issues.apache.org/jira/browse/YARN-2531
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2531.0.patch


 From YARN-2440 -
 {quote} 
 The other dimension to this is determinism w.r.t performance. Limiting to 
 allocated cores overall (as well as per container later) helps orgs run 
 workloads and reason about them deterministically. One of the examples is 
 benchmarking apps, but deterministic execution is a desired option beyond 
 benchmarks too.
 {quote}
 It would be nice to have an option to let admins to enforce strict cpu limits 
 for apps for things like benchmarking, etc. By default this flag should be 
 off so that containers can use available cpu but admin can turn the flag on 
 to determine worst case performance, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits


[ 
https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129974#comment-14129974
 ] 

Hadoop QA commented on YARN-2531:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12668046/apache-yarn-2531.0.patch
  against trunk revision 4be9517.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4890//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4890//console

This message is automatically generated.

 CGroups - Admins should be allowed to enforce strict cpu limits
 ---

 Key: YARN-2531
 URL: https://issues.apache.org/jira/browse/YARN-2531
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2531.0.patch


 From YARN-2440 -
 {quote} 
 The other dimension to this is determinism w.r.t performance. Limiting to 
 allocated cores overall (as well as per container later) helps orgs run 
 workloads and reason about them deterministically. One of the examples is 
 benchmarking apps, but deterministic execution is a desired option beyond 
 benchmarks too.
 {quote}
 It would be nice to have an option to let admins to enforce strict cpu limits 
 for apps for things like benchmarking, etc. By default this flag should be 
 off so that containers can use available cpu but admin can turn the flag on 
 to determine worst case performance, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.delta.4.patch

delta.4.patch is based on YARN-1972. Is rebased to current trunk and integrates
YARN-2458 as well.

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

Key: YARN-2198
URL: https://issues.apache.org/jira/browse/YARN-2198
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch,
YARN-2198.delta.4.patch, YARN-2198.separation.patch

YARN-1972 introduces a Secure Windows Container Executor. However this
executor requires a the process launching the container to be LocalSystem or
a member of the a local Administrators group. Since the process in question
is the NodeManager, the requirement translates to the entire NM to run as a
privileged account, a very large surface area to review and protect.
This proposal is to move the privileged operations into a dedicated NT
service. The NM can run as a low privilege account and communicate with the
privileged NT service when it needs to launch a container. This would reduce
the surface exposed to the high privileges.
There has to exist a secure, authenticated and authorized channel of
communication between the NM and the privileged NT service. Possible
alternatives are a new TCP endpoint, Java RPC etc. My proposal though would
be to use Windows LPC (Local Procedure Calls), which is a Windows platform
specific inter-process communication channel that satisfies all requirements
and is easy to deploy. The privileged NT service would register and listen on
an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop
with libwinutils which would host the LPC client code. The client would
connect to the LPC port (NtConnectPort) and send a message requesting a
container launch (NtRequestWaitReplyPort). LPC provides authentication and
the privileged NT service can use authorization API (AuthZ) to validate the
caller.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.4.patch

trunk.4.patch is same as delta.4.patch but is diff from trunk

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129983#comment-14129983
]

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12668056/YARN-2198.trunk.4.patch
against trunk revision 4be9517.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4891//console

This message is automatically generated.

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130032#comment-14130032
 ] 

Hudson commented on YARN-2459:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130026#comment-14130026
 ] 

Hudson commented on YARN-1458:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-1458. FairScheduler: Zero weight can lead to livelock. (Zhihai Xu via 
kasha) (kasha: rev 3072c83b38fd87318d502a7d1bc518963b5ccdf7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler: Zero weight can lead to livelock
 ---

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130028#comment-14130028
 ] 

Hudson commented on YARN-2440:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-2440. Enabled Nodemanagers to limit the aggregate cpu usage across all 
containers to a preconfigured limit. Contributed by Varun Vasudev. (vinodkv: 
rev 4be95175cdb58ff12a27ab443d609d3b46da7bfa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, apache-yarn-2440.6.patch, 
 screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130030#comment-14130030
 ] 

Hudson commented on YARN-415:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-415. Capture aggregate memory allocation at the app-level for chargeback. 
Contributed by Eric Payne  Andrey Klochkov (jianhe: rev 
83be3ad44484bf8a24cb90de4b9c26ab59d226a8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AggregateAppResourceUsage.java
*

[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130033#comment-14130033
 ] 

Hudson commented on YARN-2448:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-2448. Changed ApplicationMasterProtocol to expose RM-recognized resource 
types to the AMs. Contributed by Varun Vasudev. (vinodkv: rev 
b67d5ba7842cc10695d987f217027848a5a8c3d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/RegisterApplicationMasterResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto


 RM should expose the resource types considered during scheduling when AMs 
 register
 --

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, 
 apache-yarn-2448.2.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130029#comment-14130029
 ] 

Hudson commented on YARN-2158:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1893 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1893/])
YARN-2158. Fixed TestRMWebServicesAppsModification#testSingleAppKill test 
failure. Contributed by Varun Vasudev (jianhe: rev 
cbfe26370b85161c79fdd48bf69c95d5725d8f6a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.6.0

 Attachments: apache-yarn-2158.0.patch, apache-yarn-2158.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2033:
-
Attachment: YARN-2033.13.patch

The .12 patch cannot apply anymore due to conflict with latest changes on 
trunk. Rebase the patch to latest trunk branch and do slightly update in .13. 
Committing it pending on Jenkins' test.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.11.patch, 
 YARN-2033.12.patch, YARN-2033.13.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130049#comment-14130049
 ] 

Hudson commented on YARN-1458:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-1458. FairScheduler: Zero weight can lead to livelock. (Zhihai Xu via 
kasha) (kasha: rev 3072c83b38fd87318d502a7d1bc518963b5ccdf7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: Zero weight can lead to livelock
 ---

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2158) TestRMWebServicesAppsModification sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130052#comment-14130052
 ] 

Hudson commented on YARN-2158:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-2158. Fixed TestRMWebServicesAppsModification#testSingleAppKill test 
failure. Contributed by Varun Vasudev (jianhe: rev 
cbfe26370b85161c79fdd48bf69c95d5725d8f6a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* hadoop-yarn-project/CHANGES.txt


 TestRMWebServicesAppsModification sometimes fails in trunk
 --

 Key: YARN-2158
 URL: https://issues.apache.org/jira/browse/YARN-2158
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Varun Vasudev
Priority: Minor
 Fix For: 2.6.0

 Attachments: apache-yarn-2158.0.patch, apache-yarn-2158.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/582/console :
 {code}
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 66.144 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.297 sec   FAILURE!
 java.lang.AssertionError: app state incorrect
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.verifyAppStateJson(TestRMWebServicesAppsModification.java:398)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:289)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2448) RM should expose the resource types considered during scheduling when AMs register


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130056#comment-14130056
 ] 

Hudson commented on YARN-2448:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-2448. Changed ApplicationMasterProtocol to expose RM-recognized resource 
types to the AMs. Contributed by Varun Vasudev. (vinodkv: rev 
b67d5ba7842cc10695d987f217027848a5a8c3d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterResponse.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/RegisterApplicationMasterResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto


 RM should expose the resource types considered during scheduling when AMs 
 register
 --

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch, 
 apache-yarn-2448.2.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130055#comment-14130055
 ] 

Hudson commented on YARN-2459:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-2459. RM crashes if App gets rejected for any reason and HA is enabled. 
Contributed by Jian He (xgong: rev 47bdfa044aa1d587b24edae8b1b0c796d829c960)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
Fix CHANGES.txt. Credit Mayank Bansal for his contributions on YARN-2459 
(xgong: rev 7d38ffc8d3500d428bdad5640e9e70d66ed5ea60)
* hadoop-yarn-project/CHANGES.txt


 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.6.0

 Attachments: YARN-2459-1.patch, YARN-2459-2.patch, YARN-2459.3.patch, 
 YARN-2459.4.patch, YARN-2459.5.patch, YARN-2459.6.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2440) Cgroups should allow YARN containers to be limited to allocated cores


[ 
https://issues.apache.org/jira/browse/YARN-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130051#comment-14130051
 ] 

Hudson commented on YARN-2440:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-2440. Enabled Nodemanagers to limit the aggregate cpu usage across all 
containers to a preconfigured limit. Contributed by Varun Vasudev. (vinodkv: 
rev 4be95175cdb58ff12a27ab443d609d3b46da7bfa)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestNodeManagerHardwareUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java


 Cgroups should allow YARN containers to be limited to allocated cores
 -

 Key: YARN-2440
 URL: https://issues.apache.org/jira/browse/YARN-2440
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2440.0.patch, apache-yarn-2440.1.patch, 
 apache-yarn-2440.2.patch, apache-yarn-2440.3.patch, apache-yarn-2440.4.patch, 
 apache-yarn-2440.5.patch, apache-yarn-2440.6.patch, 
 screenshot-current-implementation.jpg


 The current cgroups implementation does not limit YARN containers to the 
 cores allocated in yarn-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130053#comment-14130053
 ] 

Hudson commented on YARN-415:
-

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1868/])
YARN-415. Capture aggregate memory allocation at the app-level for chargeback. 
Contributed by Eric Payne  Andrey Klochkov (jianhe: rev 
83be3ad44484bf8a24cb90de4b9c26ab59d226a8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestContainerResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/ApplicationAttemptStateData.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebAppFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
*

[jira] [Created] (YARN-2535) Test JIRA, ignore.

Allen Wittenauer created YARN-2535:
--

 Summary: Test JIRA, ignore.
 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2535) Test JIRA, ignore.


 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2535:
---
Labels: releasenotes  (was: )

 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
  Labels: releasenotes





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2535) Test JIRA, ignore.


 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2535:
---
Issue Type: Improvement  (was: Bug)

 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2535) Test JIRA, ignore.


 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2535:
---
Labels:   (was: releasenotes)

 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2535) Test JIRA, ignore.


 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2535:
---
Hadoop Flags:   (was: Incompatible change)

 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2536) YARN project doesn't have release notes field

Allen Wittenauer created YARN-2536:
--

 Summary: YARN project doesn't have release notes field
 Key: YARN-2536
 URL: https://issues.apache.org/jira/browse/YARN-2536
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer


Looking through the Hadoop project JIRAs, I noticed that YARN doesn't seem to 
have a release notes field.  I'm not sure how/why given that at least what I 
see from the JIRA administration side, it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Deleted] (YARN-2536) YARN project doesn't have release notes field


 [ 
https://issues.apache.org/jira/browse/YARN-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer deleted YARN-2536:
---


 YARN project doesn't have release notes field
 -

 Key: YARN-2536
 URL: https://issues.apache.org/jira/browse/YARN-2536
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer

 Looking through the Hadoop project JIRAs, I noticed that YARN doesn't seem to 
 have a release notes field.  I'm not sure how/why given that at least what I 
 see from the JIRA administration side, it should.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2535) Test JIRA, ignore.


 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reassigned YARN-2535:
--

Assignee: Allen Wittenauer

 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2537) relnotes.py prints description instead of release note for YARN issues

Allen Wittenauer created YARN-2537:
--

 Summary: relnotes.py prints description instead of release note 
for YARN issues
 Key: YARN-2537
 URL: https://issues.apache.org/jira/browse/YARN-2537
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer


Currently, the release notes for YARN always print the description JIRA field 
instead of the release note.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2537) relnotes.py prints description instead of release note for YARN issues


[ 
https://issues.apache.org/jira/browse/YARN-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130164#comment-14130164
 ] 

Allen Wittenauer commented on YARN-2537:


Digging deeper into the issue, it looks like JIRA is lacking a release notes 
field for YARN!  Opened INFRA-8338 to see if they can figure out why YARN lacks 
that field.

 relnotes.py prints description instead of release note for YARN issues
 --

 Key: YARN-2537
 URL: https://issues.apache.org/jira/browse/YARN-2537
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer

 Currently, the release notes for YARN always print the description JIRA field 
 instead of the release note.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets


 [ 
https://issues.apache.org/jira/browse/YARN-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2464:


Assignee: Junping Du

 Provide Hadoop as a local resource (on HDFS) which can be used by other 
 projcets
 

 Key: YARN-2464
 URL: https://issues.apache.org/jira/browse/YARN-2464
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Junping Du

 DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their 
 AM / task classpaths if they have a dependency on Hadoop libraries.
 It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, 
 native libraries) etc, which could be used instead - for applications which 
 do not want to rely upon Hadoop versions from a cluster node. This would also 
 require functionality to update the classpath/env for the apps based on the 
 structure of the tar.
 As an example, MR has support for a full tar (for rolling upgrades). 
 Similarly, Tez ships hadoop libraries along with it's build. I'm not sure 
 about the Spark / Storm / HBase model for this - but using a common copy 
 instead of everyone localizing Hadoop libraries would be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-09-11 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130268#comment-14130268
 ] 

Ashwin Shankar commented on YARN-2104:
--

Hi [~wangda],
This issues still happens for FairScheduler(in trunk) but not 
CapacityScheduler. Can you please check ?
Steps to reproduce :
1. Run an app in default queue.
2. While the app is running, go to the scheduler page on RM UI.
3. You would see the app in the apptable at the bottom.
4. Now click default queue to filter the apptable on queue name.
5. App disappears from apptable although it is running on default queue.

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

2014-09-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130282#comment-14130282
 ] 

Xuan Gong commented on YARN-611:


Test case (org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat) 
failure is unrelated

 Add an AM retry count reset window to YARN RM
 -

 Key: YARN-611
 URL: https://issues.apache.org/jira/browse/YARN-611
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Chris Riccomini
Assignee: Xuan Gong
 Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, 
 YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch, 
 YARN-611.6.patch, YARN-611.7.patch, YARN-611.8.patch, YARN-611.9.patch, 
 YARN-611.9.rebase.patch


 YARN currently has the following config:
 yarn.resourcemanager.am.max-retries
 This config defaults to 2, and defines how many times to retry a failed AM 
 before failing the whole YARN job. YARN counts an AM as failed if the node 
 that it was running on dies (the NM will timeout, which counts as a failure 
 for the AM), or if the AM dies.
 This configuration is insufficient for long running (or infinitely running) 
 YARN jobs, since the machine (or NM) that the AM is running on will 
 eventually need to be restarted (or the machine/NM will fail). In such an 
 event, the AM has not done anything wrong, but this is counted as a failure 
 by the RM. Since the retry count for the AM is never reset, eventually, at 
 some point, the number of machine/NM failures will result in the AM failure 
 count going above the configured value for 
 yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the 
 job as failed, and shut it down. This behavior is not ideal.
 I propose that we add a second configuration:
 yarn.resourcemanager.am.retry-count-window-ms
 This configuration would define a window of time that would define when an AM 
 is well behaved, and it's safe to reset its failure count back to zero. 
 Every time an AM fails the RmAppImpl would check the last time that the AM 
 failed. If the last failure was less than retry-count-window-ms ago, and the 
 new failure count is  max-retries, then the job should fail. If the AM has 
 never failed, the retry count is  max-retries, or if the last failure was 
 OUTSIDE the retry-count-window-ms, then the job should be restarted. 
 Additionally, if the last failure was outside the retry-count-window-ms, then 
 the failure count should be set back to 0.
 This would give developers a way to have well-behaved AMs run forever, while 
 still failing mis-behaving AMs after a short period of time.
 I think the work to be done here is to change the RmAppImpl to actually look 
 at app.attempts, and see if there have been more than max-retries failures in 
 the last retry-count-window-ms milliseconds. If there have, then the job 
 should fail, if not, then the job should go forward. Additionally, we might 
 also need to add an endTime in either RMAppAttemptImpl or 
 RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the 
 failure.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets

[
https://issues.apache.org/jira/browse/YARN-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130283#comment-14130283
]

Junping Du commented on YARN-2464:
--

Sync with Sid offline and I will take on this JIRA.
A couple of thoughts:
- We should leverage YARN's resource localizing mechanism to locate HADOOP jar
from a configurable place on HDFS.
- AM should have explicit API to set the HADOOP jar (against specific version)
path on HDFS for this application as well as the classpath, just like what we
do for MR in MAPREDUCE-4421.
- The HADOOP jar get loaded to NM local as public resource can shared with
different apps running on the same node, but different version of Hadoop jar
should have different name. If getting the same name with different content
(like checking size, etc.), it should warn with exception.
- Through this, running apps on top of YARN doesn't rely on Hadoop JAR on node
local which benefit rolling upgrade feature.

Provide Hadoop as a local resource (on HDFS) which can be used by other
projcets

Key: YARN-2464
URL: https://issues.apache.org/jira/browse/YARN-2464
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Junping Du

DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their
AM / task classpaths if they have a dependency on Hadoop libraries.
It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs,
native libraries) etc, which could be used instead - for applications which
do not want to rely upon Hadoop versions from a cluster node. This would also
require functionality to update the classpath/env for the apps based on the
structure of the tar.
As an example, MR has support for a full tar (for rolling upgrades).
Similarly, Tez ships hadoop libraries along with it's build. I'm not sure
about the Spark / Storm / HBase model for this - but using a common copy
instead of everyone localizing Hadoop libraries would be useful.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130292#comment-14130292
 ] 

Junping Du commented on YARN-2033:
--

Seems not work. [~zjshen], can you provide a new one here? Thanks!

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.11.patch, 
 YARN-2033.12.patch, YARN-2033.13.patch, YARN-2033.2.patch, YARN-2033.3.patch, 
 YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.8.patch, YARN-2033.9.patch, YARN-2033.Prototype.patch, 
 YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, 
 YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--
Attachment: YARN-2033.13.patch

Fix the conflicts against the latest trunk

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.11.patch, 
 YARN-2033.12.patch, YARN-2033.13.patch, YARN-2033.13.patch, 
 YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, 
 YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, YARN-2033.9.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-11 Thread Andrey Klochkov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130368#comment-14130368
 ] 

Andrey Klochkov commented on YARN-415:
--

[~eepayne], congratulations and thanks for this tremendous amount of 
persistence! :-)

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Fix For: 2.6.0

 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, 
 YARN-415.201409102216.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-09-11 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130383#comment-14130383
 ] 

Benoy Antony commented on YARN-2527:



{code}
 this.applicationACLsManager.addApplication(applicationId,
submissionContext.getAMContainerSpec().getApplicationACLs());
{code}
My guess is that 
{code}submissionContext.getAMContainerSpec().getApplicationACLs(){code} could 
be null for some Application types. (non mapreduce).  




 NPE in ApplicationACLsManager
 -

 Key: YARN-2527
 URL: https://issues.apache.org/jira/browse/YARN-2527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: YARN-2527.patch, YARN-2527.patch


 NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
 The relevant stacktrace snippet from the ResourceManager logs is as below
 {code}
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
 at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
 at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
 {code}
 This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-415) Capture aggregate memory allocation at the app-level for chargeback

2014-09-11 Thread Andrey Klochkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-415:
-
Assignee: Eric Payne  (was: Andrey Klochkov)

 Capture aggregate memory allocation at the app-level for chargeback
 ---

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Kendall Thrapp
Assignee: Eric Payne
 Fix For: 2.6.0

 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.201408181938.txt, 
 YARN-415.201408181938.txt, YARN-415.201408212033.txt, 
 YARN-415.201409040036.txt, YARN-415.201409092204.txt, 
 YARN-415.201409102216.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130479#comment-14130479
 ] 

Hadoop QA commented on YARN-2033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668114/YARN-2033.13.patch
  against trunk revision bf64fce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4893//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4893//console

This message is automatically generated.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.10.patch, YARN-2033.11.patch, 
 YARN-2033.12.patch, YARN-2033.13.patch, YARN-2033.13.patch, 
 YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, 
 YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.8.patch, YARN-2033.9.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare

2014-09-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2534:
---
Summary: FairScheduler: Potential integer overflow calculating 
totalMaxShare  (was: FairScheduler: totalMaxShare is not calculated correctly 
in computeSharesInternal)

 FairScheduler: Potential integer overflow calculating totalMaxShare
 ---

 Key: YARN-2534
 URL: https://issues.apache.org/jira/browse/YARN-2534
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2534.000.patch


 FairScheduler: totalMaxShare is not calculated correctly in 
 computeSharesInternal for some cases.
 If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE 
 ,but each individual MAX share is not equal to Integer.MAX_VALUE. then 
 totalMaxShare will be a negative value, which will cause all fairShare are 
 wrongly calculated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2534) FairScheduler: Potential integer overflow calculating totalMaxShare

2014-09-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130519#comment-14130519
 ] 

Karthik Kambatla commented on YARN-2534:


Pretty straight-forward fix. +1. 

 FairScheduler: Potential integer overflow calculating totalMaxShare
 ---

 Key: YARN-2534
 URL: https://issues.apache.org/jira/browse/YARN-2534
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2534.000.patch


 FairScheduler: totalMaxShare is not calculated correctly in 
 computeSharesInternal for some cases.
 If the sum of MAX share of all Schedulables is more than Integer.MAX_VALUE 
 ,but each individual MAX share is not equal to Integer.MAX_VALUE. then 
 totalMaxShare will be a negative value, which will cause all fairShare are 
 wrongly calculated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-1104) NMs to support rolling logs of stdout stderr

2014-09-11 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1104.
---
Resolution: Duplicate

This is an older ticket, but I am closing this as dup of YARN-2468 where 
patches are already posted, and the work is in progress..

 NMs to support rolling logs of stdout  stderr
 --

 Key: YARN-1104
 URL: https://issues.apache.org/jira/browse/YARN-1104
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Xuan Gong

 Currently NMs stream the stdout and stderr streams of a container to a file. 
 For longer lived processes those files need to be rotated so that the log 
 doesn't overflow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-11 Thread Anubhav Dhoot (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130554#comment-14130554
]

Anubhav Dhoot commented on YARN-1372:
-

why adding context.getContainers().remove(cid); in
removeVeryOldStoppedContainersFromContext method? won’t this remove the
containers from context immediately when we send the container statuses
across, which contradicts the rest of the changes?

two reasons.
1) To enforce a timeout for the context entries as a safety net (this should
happen only after the timeout which is defaulted to 10 min). Otherwise I am
worried there will be cases where the entries do not get removed for a long
time. Is is possible that for some reason there is no ack from AM and the
application never gets removed and these entries stay in memory?
2) If we are removing it from the nm store, is there any value in keeping it in
memory? If NM restarts, its not going to know about this anyway.

Ensure all completed containers are reported to the AMs across RM restart
-

Key: YARN-1372
URL: https://issues.apache.org/jira/browse/YARN-1372
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
Attachments: YARN-1372.001.patch, YARN-1372.001.patch,
YARN-1372.002_NMHandlesCompletedApp.patch,
YARN-1372.002_RMHandlesCompletedApp.patch,
YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch,
YARN-1372.prelim.patch, YARN-1372.prelim2.patch

Currently the NM informs the RM about completed containers and then removes
those containers from the RM notification list. The RM passes on that
completed container information to the AM and the AM pulls this data. If the
RM dies before the AM pulls this data then the AM may not be able to get this
information again. To fix this, NM should maintain a separate list of such
completed container notifications sent to the RM. After the AM has pulled the
containers from the RM then the RM will inform the NM about it and the NM can
remove the completed container from the new list. Upon re-register with the
RM (after RM restart) the NM should send the entire list of completed
containers to the RM along with any other containers that completed while the
RM was dead. This ensures that the RM can inform the AM's about all completed
containers. Some container completions may be reported more than once since
the AM may have pulled the container but the RM may die before notifying the
NM about the pull.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2484) FileSystemRMStateStore#readFile/writeFile should close FSData(In|Out)putStream in final block

2014-09-11 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130560#comment-14130560
 ] 

Jason Lowe commented on YARN-2484:
--

Thanks for the report and patch, Tsuyoshi!

I'm a bit concerned that calling close() after encountering an error could 
itself throw another exception and obscure the original error.  So we may want 
to do something more like this instead:

{code}
try {
  ...
  f.close();
  f = null;
} finally {
  IOUtils.cleanup(LOG, f);
}
{code}

Also in the read case I'm not sure we care about the result of close(), since 
we have all of the data we need.  I don't want to take down the RM because 
there was an error closing a file that we completely read.

 FileSystemRMStateStore#readFile/writeFile should close 
 FSData(In|Out)putStream in final block
 -

 Key: YARN-2484
 URL: https://issues.apache.org/jira/browse/YARN-2484
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
Priority: Trivial
 Attachments: YARN-2484.1.patch


 File descriptors can leak if exceptions are thrown in these methods.
 {code}
  private byte[] readFile(Path inputPath, long len) throws Exception {
 FSDataInputStream fsIn = fs.open(inputPath);
 // state data will not be that long
 byte[] data = new byte[(int)len];
 fsIn.readFully(data);
 fsIn.close();
 return data;
   }
 {code}
 {code}
   private void writeFile(Path outputPath, byte[] data) throws Exception {
 Path tempPath =
 new Path(outputPath.getParent(), outputPath.getName() + .tmp);
 FSDataOutputStream fsOut = null;
 // This file will be overwritten when app/attempt finishes for saving the
 // final status.
 fsOut = fs.create(tempPath, true);
 fsOut.write(data);
 fsOut.close();
 fs.rename(tempPath, outputPath);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster

2014-09-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2538:

Attachment: YARN-2538.1.patch

Trivial patch

 Add logs when RM send new AMRMToken to ApplicationMaster
 

 Key: YARN-2538
 URL: https://issues.apache.org/jira/browse/YARN-2538
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2538.1.patch


 This is for testing/debugging purpose 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2357) Port Windows Secure Container Executor YARN-1063, YARN-1972, YARN-2198 changes to branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2357:
---
Attachment: YARN-2357.2.patch

.2.patch is port of YARN-2198.trunk.4.patch. Applyes cleanly to branch-2 and 
builds fine.

 Port Windows Secure Container Executor YARN-1063, YARN-1972, YARN-2198 
 changes to branch-2
 --

 Key: YARN-2357
 URL: https://issues.apache.org/jira/browse/YARN-2357
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical
  Labels: security, windows
 Attachments: YARN-2357.1.patch, YARN-2357.2.patch


 As title says. Once YARN-1063, YARN-1972 and YARN-2198 are committed to 
 trunk, they need to be backported to branch-2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

[
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130691#comment-14130691
]

Jian He commented on YARN-1372:
---

Thanks for your explanation,
bq. Is is possible that for some reason there is no ack from AM and the
application never gets removed and these entries stay in memory?
ApplicationImpl on NM should be guaranteed to be cleaned for already completed
applications. (Otherwise, it's a leak. we should fix this too.)
bq. If we are removing it from the nm store, is there any value in keeping it
in memory? If NM restarts, its not going to know about this anyway.
That's why I said in my previous comment: {{make sure
context.getNMStateStore().removeContainer(cid); is called after receiving the
notification from RM as well.}}

One other thing is:
- In RMAppAttemptImpl#pullJustFinishedContainers, we may just send the whole
list of containers in one event; Instead of sending individual event for each
container.
{code}
for (Map.EntryContainerStatus, NodeId finishedContainerStatus: this
.finishedContainersSentToAM.entrySet()) {
// Implicitly acks the previous list as being received by the AM
eventHandler.handle(new RMNodeCleanedupContainerNotifiedEvent(
finishedContainerStatus.getValue(), finishedContainerStatus
.getKey().getContainerId()));
}
{code}

Ensure all completed containers are reported to the AMs across RM restart
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-09-11 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2032:

Attachment: YARN-2032-091114.patch

Hi everyone, based on [~mayank_bansal]'s patch, I've done an updated version 
for the HBase timeline storage. General strategy and data schema are kept the 
same as the original patch. Here're what I've done:

1. Finished fromTs and fromId function in getEntities. For fromId, I added one 
more check to change the first record of a scan if necessary. For fromTs, I 
added a separate column qualifier in the entity table to store insert time for 
each entity, and, during query, filter out later records if necessary. 

2. I restructured the code such that data schema and operations are decoupled 
from the actual HBase operations. Most timeline data storage logic is in the 
abstract store class now. While HBase storage class only needs to implement the 
abstract methods and interfaces required by the abstract storage class, 
including table creation, get, put, scan, and a few other helper functions. I 
hope this pluggable interface would simplify future extensions, and help to 
provide a unified abstract storage schema across different data storages. 
Comments on this design would certainly be more than welcome. 

3. I've added two more unit test cases, to test fromId and fromTs, for HBase 
storage. Currently, the UTs work fine with branch-2, or with a HttpServer.java 
accessible by HBaseClient. But UTs are failing in trunk branch since 
HttpServer.java has been replaced. I added Ignore tags to the UT for now, but 
feel free to check them under branch-2. 

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, 
 YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster


[ 
https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130800#comment-14130800
 ] 

Hadoop QA commented on YARN-2538:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668150/YARN-2538.1.patch
  against trunk revision c656d7d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4894//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4894//console

This message is automatically generated.

 Add logs when RM send new AMRMToken to ApplicationMaster
 

 Key: YARN-2538
 URL: https://issues.apache.org/jira/browse/YARN-2538
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2538.1.patch


 This is for testing/debugging purpose 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130838#comment-14130838
 ] 

Jian He commented on YARN-2229:
---

looks good, +1

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.12.patch, 
 YARN-2229.13.patch, YARN-2229.14.patch, YARN-2229.15.patch, 
 YARN-2229.16.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2539) FairScheduler: Update the default value for maxAMShare

Wei Yan created YARN-2539:
-

 Summary: FairScheduler: Update the default value for maxAMShare
 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor


Currently, the maxAMShare per queue is -1 in default, which disables the AM 
share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2001) Threshold for RM to accept requests from AM after failover


 [ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2001:
--
Attachment: YARN-2001.3.patch

patch rebased

 Threshold for RM to accept requests from AM after failover
 --

 Key: YARN-2001
 URL: https://issues.apache.org/jira/browse/YARN-2001
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch


 After failover, RM may require a certain threshold to determine whether it’s 
 safe to make scheduling decisions and start accepting new container requests 
 from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
 until a certain amount of nodes joining before accepting new container 
 requests.  Or it could simply be a timeout, only after the timeout RM accepts 
 new requests. 
 NMs joined after the threshold can be treated as new NMs and instructed to 
 kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2539) FairScheduler: Update the default value for maxAMShare

2014-09-11 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130881#comment-14130881
 ] 

Ashwin Shankar commented on YARN-2539:
--

Hey [~ywskycn],
What about the problem described in YARN-2187 due to maxAMShare enabled by 
default ?

 FairScheduler: Update the default value for maxAMShare
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor

 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster

2014-09-11 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2538:

Attachment: YARN-2538.1.patch

 Add logs when RM send new AMRMToken to ApplicationMaster
 

 Key: YARN-2538
 URL: https://issues.apache.org/jira/browse/YARN-2538
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2538.1.patch, YARN-2538.1.patch


 This is for testing/debugging purpose 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-11 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130889#comment-14130889
 ] 

Ashwin Shankar commented on YARN-2540:
--

Note : this problem happens when we want to filter by any queue, not just 
root.default.

 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar

 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2539) FairScheduler: Update the default value for maxAMShare


[ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130891#comment-14130891
 ] 

Wei Yan commented on YARN-2539:
---

YARN-2187 is a temp solution to disable the max am share, because previous the 
fair share is the steady fair share. The steady fair share can be very small if 
we have many queues. As now we compute max am share with the dynamic fair 
share, so we reset the default value to a reasonable one. 

And, there may be a bug in the maxAMShare calculation. When one queue receives 
the first application, its fair share is still 0, which means the queue cannot 
accept the AM container. I'm double checking this problem. 

 FairScheduler: Update the default value for maxAMShare
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor

 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)


 [ 
https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1708:
---
Attachment: YARN-1708.patch

Rebased patch after sync-ing branch yarn-1051 with trunk

 Add a public API to reserve resources (part of YARN-1051)
 -

 Key: YARN-1708
 URL: https://issues.apache.org/jira/browse/YARN-1708
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, 
 YARN-1708.patch


 This JIRA tracks the definition of a new public API for YARN, which allows 
 users to reserve resources (think of time-bounded queues). This is part of 
 the admission control enhancement proposed in YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic


 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1707:
---
Attachment: YARN-1707.10.patch

Rebased patch after sync-ing branch yarn-1051 with trunk

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
 YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
 YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem


 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1709:
---
Attachment: YARN-1709.patch

Rebased patch after sync-ing branch yarn-1051 with trunk

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
 YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1711:
---
Attachment: YARN-1711.3.patch

Rebased patch after sync-ing branch yarn-1051 with trunk

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager


 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-2080:
---
Attachment: YARN-2080.patch

Rebased patch after sync-ing branch yarn-1051 with trunk

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests


 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.3.patch

Uploaded a new consolidated patch against latest trunk for you to play.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1709) Admission Control: Reservation subsystem


[ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130918#comment-14130918
 ] 

Hadoop QA commented on YARN-1709:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668205/YARN-1709.patch
  against trunk revision 6c08339.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4898//console

This message is automatically generated.

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
 YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130920#comment-14130920
 ] 

Hadoop QA commented on YARN-1707:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668203/YARN-1707.10.patch
  against trunk revision 6c08339.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4899//console

This message is automatically generated.

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
 YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
 YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)


[ 
https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130923#comment-14130923
 ] 

Hadoop QA commented on YARN-1708:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668202/YARN-1708.patch
  against trunk revision 6c08339.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4900//console

This message is automatically generated.

 Add a public API to reserve resources (part of YARN-1051)
 -

 Key: YARN-1708
 URL: https://issues.apache.org/jira/browse/YARN-1708
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subramaniam Krishnan
 Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, 
 YARN-1708.patch


 This JIRA tracks the definition of a new public API for YARN, which allows 
 users to reserve resources (think of time-bounded queues). This is part of 
 the admission control enhancement proposed in YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover


[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130936#comment-14130936
 ] 

Hadoop QA commented on YARN-2001:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668193/YARN-2001.3.patch
  against trunk revision 6c08339.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4896//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4896//console

This message is automatically generated.

 Threshold for RM to accept requests from AM after failover
 --

 Key: YARN-2001
 URL: https://issues.apache.org/jira/browse/YARN-2001
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2001.1.patch, YARN-2001.2.patch, YARN-2001.3.patch


 After failover, RM may require a certain threshold to determine whether it’s 
 safe to make scheduling decisions and start accepting new container requests 
 from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
 until a certain amount of nodes joining before accepting new container 
 requests.  Or it could simply be a timeout, only after the timeout RM accepts 
 new requests. 
 NMs joined after the threshold can be treated as new NMs and instructed to 
 kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase


[ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130935#comment-14130935
 ] 

Hadoop QA commented on YARN-2032:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12668169/YARN-2032-091114.patch
  against trunk revision c656d7d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1265 javac 
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.mover.TestStorageMover
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4895//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4895//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4895//console

This message is automatically generated.

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, 
 YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2539) FairScheduler: Update the default value for maxAMShare


 [ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2539:
--
Attachment: YARN-2539-1.patch

 FairScheduler: Update the default value for maxAMShare
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2539-1.patch


 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2539) FairScheduler: Update the default value for maxAMShare


[ 
https://issues.apache.org/jira/browse/YARN-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130942#comment-14130942
 ] 

Wei Yan commented on YARN-2539:
---

bq. And, there may be a bug in the maxAMShare calculation. When one queue 
receives the first application, its fair share is still 0, which means the 
queue cannot accept the AM container. I'm double checking this problem.
This cannot happen. The only concern is that the first application to an empty 
queue may need to wait for 0.5 second. The updateThread will set the update the 
fair share from 0 to a new value; only after that, this application has chance 
to run.

 FairScheduler: Update the default value for maxAMShare
 --

 Key: YARN-2539
 URL: https://issues.apache.org/jira/browse/YARN-2539
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2539-1.patch


 Currently, the maxAMShare per queue is -1 in default, which disables the AM 
 share constraint. Change to 0.5f would be good.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709


[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130945#comment-14130945
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668207/YARN-1711.3.patch
  against trunk revision 6c08339.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4901//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager


[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130946#comment-14130946
 ] 

Hadoop QA commented on YARN-2080:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668208/YARN-2080.patch
  against trunk revision 6c08339.

{color:red}-1 @author{color}.  The patch appears to contain  @author tags 
which the Hadoop community has agreed to not allow in code contributions.

{color:green}+1 tests included{color}.  The patch appears to include  new 
or modified test files.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4902//console

This message is automatically generated.

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subramaniam Krishnan
Assignee: Subramaniam Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error

Jian He created YARN-2541:
-

 Summary: Fix ResourceManagerRest.apt.vm syntax error
 Key: YARN-2541
 URL: https://issues.apache.org/jira/browse/YARN-2541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error


 [ 
https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2541:
--
Attachment: YARN-2541.1.patch

Patch to fix the table syntax

 Fix ResourceManagerRest.apt.vm syntax error
 ---

 Key: YARN-2541
 URL: https://issues.apache.org/jira/browse/YARN-2541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2541.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error

2014-09-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130955#comment-14130955
 ] 

Karthik Kambatla commented on YARN-2541:


+1, pending Jenkins. 

 Fix ResourceManagerRest.apt.vm syntax error
 ---

 Key: YARN-2541
 URL: https://issues.apache.org/jira/browse/YARN-2541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2541.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2538) Add logs when RM send new AMRMToken to ApplicationMaster


[ 
https://issues.apache.org/jira/browse/YARN-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130958#comment-14130958
 ] 

Hadoop QA commented on YARN-2538:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668200/YARN-2538.1.patch
  against trunk revision 6c08339.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4897//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4897//console

This message is automatically generated.

 Add logs when RM send new AMRMToken to ApplicationMaster
 

 Key: YARN-2538
 URL: https://issues.apache.org/jira/browse/YARN-2538
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2538.1.patch, YARN-2538.1.patch


 This is for testing/debugging purpose 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2541) Fix ResourceManagerRest.apt.vm syntax error


 [ 
https://issues.apache.org/jira/browse/YARN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2541:
--
Description: the incorrect table syntax somehow causes hadoop-yarn-site 
intermittent build failure as in https://jira.codehaus.org/browse/DOXIA-453

 Fix ResourceManagerRest.apt.vm syntax error
 ---

 Key: YARN-2541
 URL: https://issues.apache.org/jira/browse/YARN-2541
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2541.1.patch


 the incorrect table syntax somehow causes hadoop-yarn-site intermittent build 
 failure as in https://jira.codehaus.org/browse/DOXIA-453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1712) Admission Control: plan follower


 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Krishnan updated YARN-1712:
---
Attachment: YARN-1712.4.patch

Thanks [~leftnoteasy] for reviewing the patch. I have wrapped the debug logs 
with _isDebugEnabled()_. This patch is also rebased post sync-ing of branch 
yarn-1051 with trunk

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.4.patch, YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an inventory subsystem to the scheduler. As the inventory subsystem store the 
 plan of how the resources should be subdivided, the work we propose in this 
 JIRA realizes such plan by dynamically instructing the CapacityScheduler to 
 add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction