[jira] [Commented] (YARN-2882) Add an OPPORTUNISTIC ExecutionType

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072089#comment-15072089
 ] 

Wangda Tan commented on YARN-2882:
--

[~kasha],
No worries, since they're new APIs and in trunk only, I think we don't have to 
revert or post addendum patch, any user-facing API changes could be added to 
YARN-4335. Will review YARN-4335 and post comments.

> Add an OPPORTUNISTIC ExecutionType
> --
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, 
> YARN-2882-yarn-2877.004.patch, YARN-2882.005.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2015-12-27 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072104#comment-15072104
 ] 

Inigo Goiri commented on YARN-1013:
---

I can take this one once YARN-1015 is done.

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072107#comment-15072107
 ] 

Inigo Goiri commented on YARN-1011:
---

The doc looks good. I have a couple questions:
# What would be the first policy to implement? I guess we can define it in 
YARN-1015.
# Would it make sense to make over-subscription a global property set by the RM 
instead of per-node?
I think we need a sub-task under this umbrella for the over-subscription 
property.



> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page

2015-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072175#comment-15072175
 ] 

Hadoop QA commented on YARN-4315:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:black}{color} | {color:black} compile {color} | {color:black} 0m 30s 
{color} | {color:black} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 28s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779581/0002-YARN-4315.patch |
| JIRA Issue | YARN-4315 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| 

[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072240#comment-15072240
 ] 

Karthik Kambatla commented on YARN-1011:


bq. For resource oversubscription enable/disable for individual nodes, I think 
it's very important since some nodes could be more important than others. Do 
you think is it fine to add a configuration item to each NM's yarn-site.xml?
That is exactly the intent. Let us continue this conversation on YARN-4512. 

bq. For scheduler-side implementation, instead of modifying individual 
scheduler, I think we should try to add over-subscription policy to common 
scheduling layer since it doesn't sounds very related to specific scheduler 
implementation.
Makes sense. Doubt there is any scheduler-specific smarts here. If at all we 
need to do them separately, it is most likely because our scheduler 
abstractions are not clean. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page

2015-12-27 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072194#comment-15072194
 ] 

Bibin A Chundatt commented on YARN-4315:


Test case failures are not related. 

> NaN in Queue percentage for cluster apps page
> -
>
> Key: YARN-4315
> URL: https://issues.apache.org/jira/browse/YARN-4315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg
>
>
> Steps to reproduce
> Submit application 
> Switch RM and check the percentage of queue usage
> Queue percentage shown as NaN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4511) Create common scheduling policy for resource over-subscription

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4511:


 Summary: Create common scheduling policy for resource 
over-subscription
 Key: YARN-4511
 URL: https://issues.apache.org/jira/browse/YARN-4511
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072227#comment-15072227
 ] 

Wangda Tan commented on YARN-1011:
--

Thanks [~kasha] and also comments from [~elgoiri]. Looked at doc, it looks good.

Some questions/comments:
- For resource oversubscription enable/disable for individual nodes, I think 
it's very important since some nodes could be more important than others. Do 
you think is it fine to add a configuration item to each NM's yarn-site.xml?
- For scheduler-side implementation, instead of modifying individual scheduler, 
I think we should try to add over-subscription policy to common scheduling 
layer since it doesn't sounds very related to specific scheduler implementation.

I also agree for the first implementation, we can simply assume nodes have more 
resource to use. CS shouldn't have issue with this assumption.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4511) Create common scheduling policy for resource over-subscription

2015-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4511:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1011

> Create common scheduling policy for resource over-subscription
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Add an OPPORTUNISTIC ExecutionType

2015-12-27 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072210#comment-15072210
 ] 

Konstantinos Karanasos commented on YARN-2882:
--

bq. Thanks for picking this up, Inigo Goiri. Hope that is okay with 
Konstantinos Karanasos.
That is fine with me (and I do appreciate the help), given the urgency for 
unblocking YARN-1011, but let's coordinate better next time.

I would have liked to review the patch before we pushed it to trunk.
I am travelling at the moment and have limited connectivity, but will give it a 
look tomorrow.

> Add an OPPORTUNISTIC ExecutionType
> --
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, 
> YARN-2882-yarn-2877.004.patch, YARN-2882.005.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1014) Configure OOM Killer to kill OPPORTUNISTIC containers first

2015-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072224#comment-15072224
 ] 

Karthik Kambatla commented on YARN-1014:


[~asuresh], [~kkaranasos] - is this something we would want in trunk so we can 
share with YARN-2877? 

> Configure OOM Killer to kill OPPORTUNISTIC containers first
> ---
>
> Key: YARN-1014
> URL: https://issues.apache.org/jira/browse/YARN-1014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Arun C Murthy
>Assignee: Karthik Kambatla
>
> YARN-2882 introduces the notion of OPPORTUNISTIC containers. These containers 
> should be killed first should the system run out of memory. 
> -
> Previous description:
> Once RM allocates 'speculative containers' we need to get LCE to schedule 
> them at lower priorities via cgroups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072223#comment-15072223
 ] 

Karthik Kambatla commented on YARN-1011:


bq. Would it make sense to make over-subscription a global property set by the 
RM instead of per-node?
Good question. I thought about it quite some. Here is my reasoning for doing on 
the NM side. We can always switch back to defining it to the RM if that makes 
more sense.
# Even if we have the knob on the RM, the node still has to support it: monitor 
the resource usage on the node and kill the OPPORTUNISTIC containers if need 
be. On a cluster with NMs of different versions (say, during a rolling 
upgrade), the RM will have to keep track of NMs that support over-subscription. 
So, we do need some config for the NM anyway. Further, there could be 
node-specific conditions - hardware, other services running on the node etc. - 
that could affect the over-subscription capacity of the node. For instance, it 
might be okay to sign up for 90% of the advertised capacity on node A, but only 
80% on the node B. And, this ability to soak up extra work could change over 
time. 
# In terms of implementation, the node already sends its capacity and its 
aggregate-container-utilization. It might as well send an 
oversubscription-percentage over, which is interpreted as the fraction of its 
advertised capacity. e.g. A node with 64 GB memory could advertise its capacity 
as 50 GB and oversubscription-percentage 0.9. The RM could schedule upto 45 GB 
of utilization. An oversubscription-percentage <= 0 would indicate the feature 
is turned off. 

bq. What would be the first policy to implement? I guess we can define it in 
YARN-1015.
The simplest policy would likely be just assuming there are more resources on 
the node, and continue allocating with the same policies we use today for 
free/unallocated resources. 
This should work okay for the FairScheduler. I am less familiar with the 
intricate details of CS, but would think it should apply there as well. 
[~leftnoteasy] - thoughts? 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4512) Provide a knob to turn on over-subscription

2015-12-27 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072245#comment-15072245
 ] 

Inigo Goiri commented on YARN-4512:
---

Per the discussion in YARN-1011, it makes sense to add an option in 
yarn-site.xml and each NM to advertise this to the RM. In addition, we should 
separate oversubscription parameters for each resource (i.e., CPU and memory).

> Provide a knob to turn on over-subscription
> ---
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072228#comment-15072228
 ] 

Wangda Tan commented on YARN-1011:
--

I just created YARN-4511 to track common scheduling policy for resource 
over-subscription.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4512) Provide a knob to turn on over-subscription

2015-12-27 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4512:
--

 Summary: Provide a knob to turn on over-subscription
 Key: YARN-4512
 URL: https://issues.apache.org/jira/browse/YARN-4512
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4513) [YARN-3368] Upgrade to Ember 2.2.0

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4513:


 Summary: [YARN-3368] Upgrade to Ember 2.2.0
 Key: YARN-4513
 URL: https://issues.apache.org/jira/browse/YARN-4513
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


It uses Ember 2.0 for now, we should upgrade it to latest Ember.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4514:


 Summary: [YARN-3368] Cleanup hardcoded configurations, such as 
RM/ATS addresses
 Key: YARN-4514
 URL: https://issues.apache.org/jira/browse/YARN-4514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


We have several configurations are hard-coded, for example, RM/ATS addresses, 
we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page

2015-12-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4518:
-

Assignee: Sunil G

> [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
> --
>
> Key: YARN-4518
> URL: https://issues.apache.org/jira/browse/YARN-4518
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4517) [YARN-3368] Add nodes page

2015-12-27 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4517:
--

Assignee: Varun Saxena

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-27 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4519:
---
Description: 
In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may 
be get CapacityScheduler's sync lock in decreaseContainer()
In scheduler thread,  first get CapacityScheduler's sync lock in 
allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
FicaSchedulerApp.assignContainers(). 

  was:
In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may 
be get CapacityScheduler's sync lock in decreaseContainer()
In scheduler thread,  first get CapacityScheduler's sync lock in 
allocateContainersToNode, and may get FiCaSchedulerApp sync lock in 
FicaSchedulerApp.assignContainers. 


> potential deadlock of CapacityScheduler between decrease container and assign 
> containers
> 
>
> Key: YARN-4519
> URL: https://issues.apache.org/jira/browse/YARN-4519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: sandflee
>
> In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and 
> may be get CapacityScheduler's sync lock in decreaseContainer()
> In scheduler thread,  first get CapacityScheduler's sync lock in 
> allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in 
> FicaSchedulerApp.assignContainers(). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers

2015-12-27 Thread sandflee (JIRA)
sandflee created YARN-4519:
--

 Summary: potential deadlock of CapacityScheduler between decrease 
container and assign containers
 Key: YARN-4519
 URL: https://issues.apache.org/jira/browse/YARN-4519
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: sandflee


In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may 
be get CapacityScheduler's sync lock in decreaseContainer()
In scheduler thread,  first get CapacityScheduler's sync lock in 
allocateContainersToNode, and may get FiCaSchedulerApp sync lock in 
FicaSchedulerApp.assignContainers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072438#comment-15072438
 ] 

Jian He commented on YARN-3480:
---

lgtm,  +1

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, 
> YARN-3480.12.patch, YARN-3480.13.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072436#comment-15072436
 ] 

Jian He commented on YARN-4438:
---

bq. Not sure I understand why ZKRMStateStore needs to be an AlwaysOn service.
It does not need to be always on, just the zkClient in ZKRMStateStore needs to 
be always on.
bq. How would this change look? 
At first glance, in AdminService#transitionToStandby and transitionToActive, 
not call refreshAll if the shared-storage-config-provider is not enabled.
bq. Is the concern that Curator may be biased in picking an RM in certain 
conditions?
Yeah, that's just my guess. Immediately rejonning may have more chance to take 
leadership again.  
ActiveStandbyElector#reJoinElectionAfterFailureToBecomeActive has similar 
comments. 
bq.If leaderLatch.close() throws an exception, when does Curator realize the RM 
is not participating in the election anymore? 
Based on my understanding, I think curator will realize when it does not hear 
RM for the zkSessionTimeout period. Essentially, the zkClient at RM side will 
keep retrying to notify zk quorum that this client is closed. If close 
successds, zk quorum will get notified immediately and re-selects a leader. If 
close is kept retrying for beyond zkSessionTimeout, zk quorum will assume this 
client dies and re-selects a leader.
bq.  we might not need that thread.
Then, we can remove this thread ? I'll do separately if you agree.
bq. What happens if this RM is subsequently elected leader? Does the transition 
to Active succeed just fine?
I think it can transition to active next time it's selected as leader. The 
previous failure will most likely happen on refreshAcl.
bq.  I would like for us to err on the side of caution and do null-checks.
will do

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072463#comment-15072463
 ] 

Wangda Tan commented on YARN-4518:
--

It's yours :), thanks!

> [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
> --
>
> Key: YARN-4518
> URL: https://issues.apache.org/jira/browse/YARN-4518
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072295#comment-15072295
 ] 

Wangda Tan commented on YARN-4335:
--

Thanks [~kkaranasos],

The patch generally looks good to me, some nits:
- Javadocs of getExecutionType is wrong.
- I would prefer to keep new APIs to be unstable so we could update them before 
features become stable
- Javadocs of ExecutionType: if we use ExecutionType for resource request, I 
would suggest to add description that scheduler could use it to decide if idle 
resources could be used by the resource request.

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.
> We will initially support two resource request types: CONSERVATIVE and 
> OPTIMISTIC.
> CONSERVATIVE resource requests will be handed internally to containers of 
> GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to 
> QUEUEABLE containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072298#comment-15072298
 ] 

Wangda Tan commented on YARN-2885:
--

Hi [~asuresh],
Thanks for updating.

Looked at latest patch, I majorly looked at configuration changes and codes 
interact with existing RM components. some comments:
- Do you have real use case that distributed scheduler needs to set different 
properties such as DIST_SCHEDULING_MIN_MEMORY? 
Since MIN_MEMORY is a property that AM needs to know (for purpose of 
calculating how much resources to request), we need to tell AM when MIN_MEMORY 
of local RM is different from central RM. I would suggest to use central RM's 
settings for MIN_MEMORY, etc. if you don't have real use case for now.
- First constructor of ApplicationMasterService, should use {{name}} instead of 
{{ApplicationMasterService.class.getName()}}?
- You can add a isDistributedSchedulingEnabled method to YarnConfiguration to 
avoid duplicated logic like:
{code}
314 boolean isDistSchedulingEnabled =
315 conf.getBoolean(YarnConfiguration.DIST_SCHEDULING_ENABLED,
316 YarnConfiguration.DIST_SCHEDULING_ENABLED_DEFAULT);
{code}

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4517) [YARN-3368] Add nodes page

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4517:


 Summary: [YARN-3368] Add nodes page
 Key: YARN-4517
 URL: https://issues.apache.org/jira/browse/YARN-4517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


We need nodes page added to next generation web UI, similar to existing 
RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2015-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072373#comment-15072373
 ] 

Sunil G commented on YARN-4515:
---

Hi [~leftnoteasy]
I will try taking this ticket. Pls let me know if its fine.

> [YARN-3368] Support hosting web UI framework inside YARN RM
> ---
>
> Key: YARN-4515
> URL: https://issues.apache.org/jira/browse/YARN-4515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently it can be only launched outside of YARN, we should make it runnable 
> inside YARN for easier testing and we should have a configuration to 
> enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2015-12-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-4515:
-

Assignee: Sunil G

> [YARN-3368] Support hosting web UI framework inside YARN RM
> ---
>
> Key: YARN-4515
> URL: https://issues.apache.org/jira/browse/YARN-4515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently it can be only launched outside of YARN, we should make it runnable 
> inside YARN for easier testing and we should have a configuration to 
> enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072461#comment-15072461
 ] 

Wangda Tan commented on YARN-4515:
--

Sure! Please go ahead.

Thanks,

> [YARN-3368] Support hosting web UI framework inside YARN RM
> ---
>
> Key: YARN-4515
> URL: https://issues.apache.org/jira/browse/YARN-4515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>
> Currently it can be only launched outside of YARN, we should make it runnable 
> inside YARN for easier testing and we should have a configuration to 
> enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072279#comment-15072279
 ] 

Bikas Saha commented on YARN-1011:
--

In my prior experience, something like this is not practical without pro-active 
cpu management (which has been delegated to future work in the document). It is 
essential to run opportunistic tasks at lower OS cpu priority so that they 
never obstruct progress of normal tasks. Typically we will find that the 
machine is under-allocated the most in cpu usage since most processing has 
bursty cpu. When a normal task has a cpu burst then it should have to contend 
with an opportunistic task since this will be detrimental to the expected 
performance of that task. Without this, jobs will not run predictably in the 
cluster. From what I have seen, users prefer predictability over most other 
things. ie. having a 1 min job run in 1 min all the time vs making that job run 
in 30s 85% of the time and but in 2 mins for 5% of the time because that makes 
it really hard to establish SLAs. In fact, this is the litmus test for 
opportunistic scheduling. It should be able to raise the utilization of a 
cluster from (say 50%) to (say 75%) without affecting the latency of the jobs 
compared to when the cluster was running at 50%.

For memory, in fact, its ok to share and reach 100% capacity but its important 
to check that the machine does not start thrashing. Most well written tasks 
will run within their memory limits and start spilling etc. Opportunistic tasks 
are trying to occupy the memory that these tasks thought they could use but are 
not using (or that these tasks are keeping in buffer on purpose). The crucial 
thing to consider here is to look for stats that signify the onset of memory 
paging activity (or overall memory over-subscription at the OS level). At that 
point, even normal tasks that are within their limit will be adversely affected 
because the OS will start paging memory to disk. So we need to start 
proactively killing opportunistic tasks before the such paging activity gets 
triggered.

Handling opportunistic tasks raises questions on the involvement of the AMs. 
Unless I missed something, this is not called out clearly in the doc. In that 
sense it would be instructive to consider opportunistic scheduling in a similar 
light as preemption. App got container that it should not have gotten at that 
time if we had been strict but got it because we decided to loosen the strings 
(of queue capacity or machine capacity resp).
- will opportunistic containers be given only when for containers that are 
beyond queue capacity such that we dont break any guarantees on their 
liveliness. ie. an AM will not expect to lose any container that is within its 
queue capacity but opportunistic containers can be killed at any time.
- does the AM need to know that a newly allocated container was opportunistic. 
E.g. so that it does not schedule the highest priority work on that container. 
- will conversion of opportunistic containers to regular containers be 
automatically done by the RM? Will the RM notify the AM about such conversions?
- when terminating opportunistic containers will the RM ask the AM about which 
containers to kill? Given the above perf related scenarios this may not be a 
viable option.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-12-27 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4183:

Attachment: YARN-4183.v1.001.patch

[~sjlee0], attaching a patch as per previous description, please have a look

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Naganarasimha G R
> Attachments: YARN-4183.1.patch, YARN-4183.v1.001.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses

2015-12-27 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-4514:
---

Assignee: Naganarasimha G R

> [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
> --
>
> Key: YARN-4514
> URL: https://issues.apache.org/jira/browse/YARN-4514
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>
> We have several configurations are hard-coded, for example, RM/ATS addresses, 
> we should make them configurable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072292#comment-15072292
 ] 

Wangda Tan commented on YARN-4315:
--

Looks good, +1, will commit shortly. Thanks [~bibinchundatt].

> NaN in Queue percentage for cluster apps page
> -
>
> Key: YARN-4315
> URL: https://issues.apache.org/jira/browse/YARN-4315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg
>
>
> Steps to reproduce
> Submit application 
> Switch RM and check the percentage of queue usage
> Queue percentage shown as NaN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4516) [YARN-3368] Use em-table to better render tables

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4516:


 Summary: [YARN-3368] Use em-table to better render tables
 Key: YARN-4516
 URL: https://issues.apache.org/jira/browse/YARN-4516
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


Currently we're using DataTables, it isn't integrated to Ember.js very well. 
Instead we can use em-table (which is created for Tez UI). It supports features 
such as selectable columns, pagination, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4515:


 Summary: [YARN-3368] Support hosting web UI framework inside YARN 
RM
 Key: YARN-4515
 URL: https://issues.apache.org/jira/browse/YARN-4515
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


Currently it can be only launched outside of YARN, we should make it runnable 
inside YARN for easier testing and we should have a configuration to 
enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4516) [YARN-3368] Use em-table to better render tables

2015-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4516:
-
Description: 
Currently we're using DataTables, it isn't integrated to Ember.js very well. 
Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, 
which is created for Tez UI). It supports features such as selectable columns, 
pagination, etc.

  was:
Currently we're using DataTables, it isn't integrated to Ember.js very well. 
Instead we can use em-table (which is created for Tez UI). It supports features 
such as selectable columns, pagination, etc.


> [YARN-3368] Use em-table to better render tables
> 
>
> Key: YARN-4516
> URL: https://issues.apache.org/jira/browse/YARN-4516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> Currently we're using DataTables, it isn't integrated to Ember.js very well. 
> Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, 
> which is created for Tez UI). It supports features such as selectable 
> columns, pagination, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072307#comment-15072307
 ] 

Wangda Tan commented on YARN-4516:
--

Thanks [~Sreenath] for creating em-table.

> [YARN-3368] Use em-table to better render tables
> 
>
> Key: YARN-4516
> URL: https://issues.apache.org/jira/browse/YARN-4516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> Currently we're using DataTables, it isn't integrated to Ember.js very well. 
> Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, 
> which is created for Tez UI). It supports features such as selectable 
> columns, pagination, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node

2015-12-27 Thread sandflee (JIRA)
sandflee created YARN-4520:
--

 Summary: FinishAppEvent is leaked in leveldb if no app's container 
running on this node
 Key: YARN-4520
 URL: https://issues.apache.org/jira/browse/YARN-4520
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: sandflee
Assignee: sandflee


once we restart nodemanager we see many logs like :
2015-12-28 11:59:18,725 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Event EventType: FINISH_APPLICATION sent to absent application 
application_1446103803043_9892

we find that the app containers are never started on NM but released by AM 
after allocated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3368) [Umbrella] YARN web UI: Next generation

2015-12-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3368:
-
Summary: [Umbrella] YARN web UI: Next generation  (was: [Umbrella] Improve 
YARN web UI)

> [Umbrella] YARN web UI: Next generation
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072326#comment-15072326
 ] 

Wangda Tan commented on YARN-3215:
--

Hi [~Naganarasimha],

Thanks for considering this.

One idea in my mind is, can we return headroom for all partitions *requested by 
application* to application. Returning total available resources of the queue 
to app could lead to over-estimate headroom. It is possible that a queue can 
use many partitions but app only wants one.

Thoughts?

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page

2015-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072374#comment-15072374
 ] 

Sunil G commented on YARN-4518:
---

Hi [~leftnoteasy]
I will also give a shot for this ticket. Pls let me know if its fine.

> [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
> --
>
> Key: YARN-4518
> URL: https://issues.apache.org/jira/browse/YARN-4518
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072414#comment-15072414
 ] 

Jian He commented on YARN-4138:
---

I think it may be true that this will lead to dead lock.
- CapacityScheduler#allocateContainersToNode will grab scheduler lock and then 
SchedulerApp's lock at LeafQueue#assignContainers.
- CapacityScheduler#rollbackContainerResource first acquires SchedulerApp's 
lock and then scheduler lock.  
-- This will also happen when AM calls CapacityScheduler#allocate to decrease 
the container. This is introduced in YARN-1651. I had a 
[comment|https://issues.apache.org/jira/browse/YARN-1651?focusedCommentId=14738568=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14738568]
  earlier that every AM allocate call will hold scheduler and queue's 
lock,which is too expensive, but missed that this may lead to deadlock. 

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072426#comment-15072426
 ] 

MENG DING commented on YARN-4138:
-

Release containers may have the same issue too. Strange that there has been no 
reports from the field so far? Looks like we need to implement a pending 
release/decrease list in the scheduler ...

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-27 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072430#comment-15072430
 ] 

sandflee commented on YARN-4138:


when release containers , we didn't hold SchedulerApp's lock.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page

2015-12-27 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4518:


 Summary: [YARN-3368] Support rendering statistic-by-node-label for 
queues/apps page
 Key: YARN-4518
 URL: https://issues.apache.org/jira/browse/YARN-4518
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-27 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072314#comment-15072314
 ] 

sandflee commented on YARN-4138:


Hi, [~mding], I'll open a new jira to track this, not to delaying this issue.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables

2015-12-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072334#comment-15072334
 ] 

Li Lu commented on YARN-4516:
-

Hi [~leftnoteasy], [~Sreenath], if currently there is nobody working on this 
item, maybe I can work on this to fine tune the tables in ATS v2 branch? 
Thanks! 

> [YARN-3368] Use em-table to better render tables
> 
>
> Key: YARN-4516
> URL: https://issues.apache.org/jira/browse/YARN-4516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> Currently we're using DataTables, it isn't integrated to Ember.js very well. 
> Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, 
> which is created for Tez UI). It supports features such as selectable 
> columns, pagination, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2015-12-27 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072387#comment-15072387
 ] 

sandflee commented on YARN-4495:


RM will pass InvalidResourceRequestException to AM in below conditions, 

* deduped containerChangeRequest
* invaild ContainerChangeRequest  requestConainerSize < 0 or > max
* rmContainer == null
* rmContainer.state != RUNNING
* increaseRequest  targeResource < allocatedResource  or decreaseRequest 
targetResource > allocatedResource
* nodeResource < increaseRequest targetResource

 this will cause AMRMClientAsync down, and this will result AM down. it's not 
user friendly. especially some condition are out of AM's control.
* rmContainer == null ,  maybe RM is recovering, and the corresponding 
RMContainer has not recovered.
* rmContainer.state != RUNNING,   maybe container is completed and the complete 
msg has not pulled by AM yet.
* increaseRequest  targeResource < allocatedResource  or decreaseRequest 
targetResource > allocatedResource.  
1,  AM increase resource  1G -> 10G, resource couldn't be satisfied and is 
pending
2,  after a time, AM send a new resourceIncreaseRequest from 1G->5G
3, 10G resource request is satisfied and RMContainer allocatedResource becomes 
10G when new resourceIncreaseRequest comes to RM
4,  RM checks ResourceIncreaseRequest, and find the target resource is less 
than RMContainer allocated resource 
* nodeResource < increaseRequest targetResource, AM knowns nothing of node 
resource , this should be covered by maximumAllocation.

and scheduler may drop container resize request if target resource equals to 
RMContainer allocatedResource, the problem is AM knows nothing about container 
resource normalizition. so: if AM requests resource decrease 8G -> 7.5G, and 
suppose 7.5G is normalized to 8G, rm will drop this request, and will leave AM 
waiting from the reply.

so above all , I sugget to add a msg to AllocateResponse instead of throw 
InvalidResourceRequestException or drop the change request.  

hoping for your comments and suggestions!




> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-12-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072303#comment-15072303
 ] 

Naganarasimha G R commented on YARN-4183:
-

can update the title if the fix is fine

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Naganarasimha G R
> Attachments: YARN-4183.1.patch, YARN-4183.v1.001.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072302#comment-15072302
 ] 

Wangda Tan commented on YARN-1011:
--

Thanks,

bq. Makes sense. Doubt there is any scheduler-specific smarts here. If at all 
we need to do them separately, it is most likely because our scheduler 
abstractions are not clean.
Agree!

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3368) [Umbrella] YARN web UI: Next generation

2015-12-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072309#comment-15072309
 ] 

Wangda Tan commented on YARN-3368:
--

I just created several sub tasks. Please feel free to assign to yourself if you 
interested.

Thanks!

> [Umbrella] YARN web UI: Next generation
> ---
>
> Key: YARN-3368
> URL: https://issues.apache.org/jira/browse/YARN-3368
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
> Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) 
> yarn-ui-screenshots.zip
>
>
> The goal is to improve YARN UI for better usability.
> We may take advantage of some existing front-end frameworks to build a 
> fancier, easier-to-use UI. 
> The old UI continue to exist until  we feel it's ready to flip to the new UI.
> This serves as an umbrella jira to track the tasks. we can do this in a 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072427#comment-15072427
 ] 

Hadoop QA commented on YARN-4183:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 6s 
{color} | {color:green} hadoop-yarn-site in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 8s 
{color} | {color:green} hadoop-yarn-site in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 57s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779596/YARN-4183.v1.001.patch
 |
| JIRA Issue | YARN-4183 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux 56b10cd8b7e5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fb00794 |
| Default Java | 

[jira] [Updated] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node

2015-12-27 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4520:
---
Attachment: YARN-4520.01.patch

> FinishAppEvent is leaked in leveldb if no app's container running on this node
> --
>
> Key: YARN-4520
> URL: https://issues.apache.org/jira/browse/YARN-4520
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4520.01.patch
>
>
> once we restart nodemanager we see many logs like :
> 2015-12-28 11:59:18,725 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: FINISH_APPLICATION sent to absent application 
> application_1446103803043_9892
> we find that the app containers are never started on NM but released by AM 
> after allocated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-12-27 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072478#comment-15072478
 ] 

Rohith Sharma K S commented on YARN-4393:
-

I think Varun's analysis make sense to me that need not add everywhere 
dispatcher.await. We can use dispatcher.await only when asserting for event 
type OR asserting any functionality after processing that event from dispatcher 
may be asserting for the values.

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4482) Default values of several config parameters are missing

2015-12-27 Thread Mohammad Shahid Khan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Shahid Khan updated YARN-4482:
---
Assignee: (was: Mohammad Shahid Khan)

> Default values of several config parameters are missing 
> 
>
> Key: YARN-4482
> URL: https://issues.apache.org/jira/browse/YARN-4482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Priority: Minor
>
> In {{yarn-default.xml}}, the default values of the following parameters are 
> commented out, 
> {{yarn.client.failover-max-attempts}}
> {{yarn.client.failover-sleep-base-ms}}
> {{yarn.client.failover-sleep-max-ms}}
> Are these default values changed (I suppose so)? If so, we should update the 
> new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
> values...
> (yarn-default.xml)
> https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables

2015-12-27 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072526#comment-15072526
 ] 

Sreenath Somarajapuram commented on YARN-4516:
--

[~gtCarrera9]
Feel free to take-up the task after checking with [~leftnoteasy]. Please let me 
know if you need any help with em-table.

> [YARN-3368] Use em-table to better render tables
> 
>
> Key: YARN-4516
> URL: https://issues.apache.org/jira/browse/YARN-4516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> Currently we're using DataTables, it isn't integrated to Ember.js very well. 
> Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, 
> which is created for Tez UI). It supports features such as selectable 
> columns, pagination, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4482) Default values of several config parameters are missing

2015-12-27 Thread Mohammad Shahid Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072484#comment-15072484
 ] 

Mohammad Shahid Khan commented on YARN-4482:


Hi  [#Tianyin Xu] agree with you.
we can mark this jira won't fix.

> Default values of several config parameters are missing 
> 
>
> Key: YARN-4482
> URL: https://issues.apache.org/jira/browse/YARN-4482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Priority: Minor
>
> In {{yarn-default.xml}}, the default values of the following parameters are 
> commented out, 
> {{yarn.client.failover-max-attempts}}
> {{yarn.client.failover-sleep-base-ms}}
> {{yarn.client.failover-sleep-max-ms}}
> Are these default values changed (I suppose so)? If so, we should update the 
> new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
> values...
> (yarn-default.xml)
> https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072480#comment-15072480
 ] 

MENG DING commented on YARN-4138:
-

You are right, I remembered that wrong.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node

2015-12-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072489#comment-15072489
 ] 

Hadoop QA commented on YARN-4520:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 (total was 53, now 54). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 25s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 32s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779609/YARN-4520.01.patch |
| JIRA Issue | YARN-4520 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d4a033bcd0a9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 

[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072481#comment-15072481
 ] 

Sunil G commented on YARN-4352:
---

ASF warnings are not related to this patch. 

This change covers test case fix, so test coverage seems fine. [~rohithsharma] 
is it ok?

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node

2015-12-27 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4520:
---
Attachment: YARN-4520.02.patch

fix checkstyle errors

> FinishAppEvent is leaked in leveldb if no app's container running on this node
> --
>
> Key: YARN-4520
> URL: https://issues.apache.org/jira/browse/YARN-4520
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4520.01.patch, YARN-4520.02.patch
>
>
> once we restart nodemanager we see many logs like :
> 2015-12-28 11:59:18,725 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: FINISH_APPLICATION sent to absent application 
> application_1446103803043_9892
> we find that the app containers are never started on NM but released by AM 
> after allocated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-12-27 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072148#comment-15072148
 ] 

Inigo Goiri commented on YARN-4330:
---

[~ste...@apache.org], is this good to go?

> MiniYARNCluster prints multiple  Failed to instantiate default resource 
> calculator warning messages
> ---
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-4330.01.patch
>
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4315) NaN in Queue percentage for cluster apps page

2015-12-27 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4315:
---
Attachment: 0002-YARN-4315.patch

[~leftnoteasy]
Thank you for looking into patch . Updated patch based on comments please do 
review.

> NaN in Queue percentage for cluster apps page
> -
>
> Key: YARN-4315
> URL: https://issues.apache.org/jira/browse/YARN-4315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg
>
>
> Steps to reproduce
> Submit application 
> Switch RM and check the percentage of queue usage
> Queue percentage shown as NaN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)