date:20160104

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081723#comment-15081723
 ] 

Varun Saxena commented on YARN-2902:


[~djp], uploaded a patch backporting changes to branch-2.6
Changes in container-executor.c are not required as the change was necessitated 
by changes made in YARN-3089 which is not there in branch-2.6 
If YARN-3089 is ever brought into branch-2.6, it will break fix made by this 
JIRA(while using LCE).
Would a comment on that JIRA be enough to avoid this potential problem ?
However, YARN-3089 is unlikely to be backported to branch-2.6

We must backport YARN-4354(as you mentioned) and YARN-4380(the newly added test 
here fails intermittently without it) if we get this into branch-2.6

Also should I reopen this JIRA to run QA on this patch ?



> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-04 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081685#comment-15081685
 ] 

Karthik Kambatla commented on YARN-3446:


Sorry for not looking at this since the last update. Looks like the patch 
doesn't apply anymore. 

[~zxu] - mind updating the patch? I ll take a look more promptly this time. 

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Attachment: YARN-2902-branch-2.6.01.patch

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081704#comment-15081704
 ] 

Hadoop QA commented on YARN-3446:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-3446 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12766024/YARN-3446.003.patch |
| JIRA Issue | YARN-3446 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10147/console |


This message was automatically generated.



> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2016-01-04 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081807#comment-15081807
 ] 

Subru Krishnan commented on YARN-2575:
--

I am wondering whether we have covered the behavior for the following 2 
scenarios:
  * When Reservation ACLs are not enabled - in this case, everyone should have 
access.
  * When Reservation ACLs are enabled but not defined - in this case also I 
think everyone should have access. [~asuresh], can you confirm? 

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.1.patch, 
> YARN-2575.v2.patch, YARN-2575.v3.patch, YARN-2575.v4.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081550#comment-15081550
 ] 

MENG DING commented on YARN-4528:
-

Hi, [~sandflee]

With current logic, I think RM won't know if a container decrease msg has 
really been persisted in NM state store or not, even if you decrease resource 
synchronously in NM. For example, suppose we now synchronously decrease 
resource in NM, and something goes wrong when writing the NM state store, then 
an exception will be thrown, and will be caught by the following statement 
during status update in NM:

{code}
catch (Throwable e) {

// TODO Better error handling. Thread can die with the rest of the
// NM still running.
LOG.error("Caught exception in status-updater", e);
  } 
{code}

So to me, there is really no benefit of decreasing container resource 
synchronously in NM, is it?

> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-01-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081644#comment-15081644
 ] 

Jason Lowe commented on YARN-3692:
--

We cannot change the signature of the existing killApplication method or we 
break backwards compatibility.  However as [~Naganarasimha] mentioned we can 
simply add another method that takes the two arguments instead of one.  Then we 
can support both the old method for backwards compatibility and the new method 
which allows a user-provided diagnostic.  I'm not sure we should deprecate the 
old method just yet.  We can still generate a useful diagnostic message 
automatically on the RM side when one is not provided, such as which user from 
which host issued the kill command.

> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)

gu-chi created YARN-4536:


 Summary: DelayedProcessKiller may not work under heavy workload
 Key: YARN-4536
 URL: https://issues.apache.org/jira/browse/YARN-4536
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: gu-chi


I am now facing with orphan process of container. Here is the scenario:
With heavy task load, the NM machine CPU usage can reach almost 100%. When some 
container got event of kill, it will get  {{SIGTERM}} , and then the parent 
process exit, leave the container process to OS. This container process need 
handle some shutdown events or some logic, but hardly can get CPU, we suppose 
to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} ,but the parent 
process which persisted as container pid no longer exist, so the kill command 
can not reach the container process. This is how orphan container process come.
The orphan process do exit after some time, but the period can be very long, 
and will make the OS status worse. As I observed, the period can be several 
hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4537:

Summary: Pull out priority comparison from fifocomparator and use compound 
comparator for FifoOrdering policy  (was: Pull out priority comparison from 
fifocomparator and use compound comparator for FIFOOrdering policy)

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FIFOOrdering policy

2016-01-04 Thread Rohith Sharma K S (JIRA)

Rohith Sharma K S created YARN-4537:
---

 Summary: Pull out priority comparison from fifocomparator and use 
compound comparator for FIFOOrdering policy
 Key: YARN-4537
 URL: https://issues.apache.org/jira/browse/YARN-4537
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


Currently, priority comparison is integrated with FifoComparator. There should 
be a separate comparator defined for priority comparison so that down the line 
if any new ordering policy wants to integrate priority, they can use compound 
comparator where priority will be high preference. 

The following changes are expected to be done as part of this JIRA
# Pull out priority comparison from FifoComparator
# Define new priority comparator
# Use compound comparator for FifoOrderingPolicy. Oder of preference is   
Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2016-01-04 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4352:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-4478)

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-04 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-4538:
--

 Summary: QueueMetrics pending  cores and memory metrics wrong
 Key: YARN-4538
 URL: https://issues.apache.org/jira/browse/YARN-4538
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Submit 2 application to default queue 
Check queue metrics for pending cores and memory

{noformat}
List allQueues = client.getChildQueueInfos("root");

for (QueueInfo queueInfo : allQueues) {
  QueueStatistics quastats = queueInfo.getQueueStatistics();
  System.out.println(quastats.getPendingVCores());
  System.out.println(quastats.getPendingMemoryMB());
}

{noformat}

*Output :*

-20
-20480




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081062#comment-15081062
 ] 

Sunil G commented on YARN-4537:
---

Thanks [~rohithsharma] for sharing the patch.
Generally patch looks good.

Few minor nits:
1. 
{noformat}
private CompoundComparator fifoComparator;
{noformat}
I feel this variable is not needed in {{FifoOrderingPolicy}}

2. Also in {{PriorityComparator}} if needed we can remove the temporary 
variable {{res}}.
3. {{TestFifoOrderingPolicy}}, could u pls add a case where priority is null 
too. Since we have a comparator, its good we have a case for all corner cases.

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081043#comment-15081043
 ] 

Hadoop QA commented on YARN-4528:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server (total was 134, now 135). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 4s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 17s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 171m 3s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests |

[jira] [Updated] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FIFOOrdering policy

2016-01-04 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4537:

Attachment: 0001-YARN-4537.patch

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FIFOOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-04 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081051#comment-15081051
 ] 

Rohith Sharma K S commented on YARN-4538:
-

Is this same as YARN-4481?

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081068#comment-15081068
 ] 

Sunil G commented on YARN-4304:
---

Test case failures are not-related.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4535) Fix checkstyle error in CapacityScheduler.java

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081065#comment-15081065
 ] 

Hadoop QA commented on YARN-4535:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 59s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL |

[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081053#comment-15081053
 ] 

Sunil G commented on YARN-4538:
---

Hi [~bibinchundatt]
Cud u pls upload and RM and NM logs. We are tracing issue for sometime.

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4535) Fix checkstyle error in CapacityScheduler.java

2016-01-04 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4535:

Attachment: YARN-4535.v1.001.patch

attaching the patch and includes no test case as this is trivial patch

> Fix checkstyle error in CapacityScheduler.java
> --
>
> Key: YARN-4535
> URL: https://issues.apache.org/jira/browse/YARN-4535
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Naganarasimha G R
>Priority: Trivial
> Attachments: YARN-4535.v1.001.patch
>
>
> In the below code *CS#parseQueue*, expression can be simplified instead of 
> {{queue instanceof LeafQueue == true}} & {{queues.get(queueName) instanceof 
> LeafQueue == true}}
> {code}
>  if(queue instanceof LeafQueue == true && queues.containsKey(queueName)
>   && queues.get(queueName) instanceof LeafQueue == true) {
>   throw new IOException("Two leaf queues were named " + queueName
> + ". Leaf queue names must be distinct");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-04 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4479:

Attachment: 0004-YARN-4479.patch

Updated the patch fixing some of checkstyle and findbugs warnings.

> Retrospect app-priority in pendingOrderingPolicy during recovering 
> applications
> ---
>
> Key: YARN-4479
> URL: https://issues.apache.org/jira/browse/YARN-4479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, 
> 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch
>
>
> Currently, same ordering policy is used for pending applications and active 
> applications. When priority is configured for an applications, during 
> recovery high priority application get activated first. It is possible that 
> low priority job was submitted and running state. 
> This causes low priority job in starvation after recovery



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2016-01-04 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080805#comment-15080805
 ] 

Vinayakumar B commented on YARN-4352:
-

+1  LGTM

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2016-01-04 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080806#comment-15080806
 ] 

Vinayakumar B commented on YARN-4352:
-

Also can move to hadoop-common before committing since change is in only in 
hadoop-common.

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4232) TopCLI console support for HA mode

2016-01-04 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: 0002-YARN-4232.patch

Attaching patch for review 
Cluster info proto added

> TopCLI console support for HA mode
> --
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch, 0002-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread sandflee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4528:
---
Attachment: YARN-4528.01.patch

1, pending container decrease msg util next heartbeat.
2, nodemanager#allocate decrease resource directly.

> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2902:
-
Target Version/s: 2.7.2, 2.6.4  (was: 2.7.2)

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4412) Create ClusterManager to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-01-04 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082079#comment-15082079
 ] 

Arun Suresh commented on YARN-4412:
---

i meant : "solicit feedback on the approah" :)

> Create ClusterManager to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch
>
>
> Introduce a Cluster Manager that aggregates Load and Policy information from 
> individual Node Managers and computes an ordered list of preferred Node 
> managers to be used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082083#comment-15082083
 ] 

MENG DING commented on YARN-4528:
-

Honestly I don't think the design needs to be changed, unless other people 
think differently. As you said, this RARELY, if ever happens. Also, we 
acknowledged that AM only issues decrease request when it knows that a 
container doesn't need the original amount of resource, and a failed decrease 
message in NM is not at all fatal (unlike a failed increase message, which may 
cause the container to be killed by the resource enforcement). 

> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4412) Create ClusterManager to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-01-04 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4412:
--
Summary: Create ClusterManager to compute ordered list of preferred NMs for 
OPPORTUNITIC containers  (was: Create ClusterManager to compute ordered list of 
preferred NMs for QUEUEABLE containers)

> Create ClusterManager to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce a Cluster Manager that aggregates Load and Policy information from 
> individual Node Managers and computes an ordered list of preferred Node 
> managers to be used as target Nodes for QUEUEABLE container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-04 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4265:

Attachment: YARN-4265-trunk.002.patch

Thanks [~djp] for the review! In the 002 patch I addressed most of the 
checkstyle problems, as well as most existing comments. Please feel free to add 
more. Some comments:

bq. I noticed that we are setting 1 minutes as default scan interval but 
original patch in HDFS-3942 is 5 minutes. Why shall we do any update here? 
For now I increased the default frequency to scan HDFS and pull timeline data. 
Having a 5-minute time interval means users are less likely to see any running 
status for apps that finish within 5 minutes. Right now I'm setting this value 
to 1 minute to reduce reader react time. 

bq. The same question on "app-cache-size", the default value in HDFS-3942 is 5 
but here is 10. Any reason to update the value?
In YARN-3942, caching is performed on application level. In this patch, caching 
is performed in entity groups. Each application may have a few to tens of 
entity groups. Normally, there are slightly more active entity groups than 
active applications in the system. For now, I'm increasing this default value 
to hold slightly more entity groups in cache. 

bq. Why we don't have any default value specified in property of 
"yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes"?
Plugins are provided by third-party applications such as Tez. Right now we 
cannot assume which exact entity group plugin the user is using, therefore we 
have to conservatively leave this config as empty. 

bq. For EmptyTimelineEntityGroupPlugin.java, why we need this class? I didn't 
see any help provided even in tests. We should remove it if useless.
Ah, nice catch. Removed it. 

bq. Can we optimize the synchronization logic here? Like in synchronized method 
refreshCache, we are intialize/start/stop TimelineDataManager (and 
MemoryTimelineStore) which is quite expensive and unnecessary to block other 
synchronized operations. Shall we move these operations out of synchronized 
block?
It's certainly doable. Right now I have yet to optimize this part because it's 
a little bit tricky to fine tune synchronization performance before we have a 
relatively stable starting point. Also, we're using fine-grained locking for 
each cached item in the reader cache, and cache refresh only happens 
infrequently (~10 secs by default), so maybe we'd like to stabilize the whole 
synchronization story before fine tune this part? 

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-04 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081950#comment-15081950
 ] 

Bikas Saha commented on YARN-1011:
--

In Tez we always try to allocated the most important work to the next allocated 
container. So doing opportunistic containers without providing the AM with the 
ability to know about it and use it judiciously may not be something that can 
be delayed to a second phase.

Being able to choose only guaranteed or non-guaranteed containers only covers 
half the problem (and probably the less relevant one) in which an application 
should always finish in 1min using guaranteed capacity but may sometimes finish 
in 30s because it got opportunistic containers. The other side is probably more 
important where a regression is caused due to opportunistic containers. 
1) the app got opportunistic containers and their perf wasnt the same as normal 
containers - so it ran slower. This may be mitigated by the system guaranteeing 
that only excess container beyond guaranteed capacity would be opportunistic. 
This would require that the system upgrade opportunistic containers in the same 
order as it would allocate containers. However, things get complicated because 
a node with an opportunistic container may continue to run its normal 
containers while space frees up for guaranteed capacity on other nodes. At this 
point, which container becomes guaranteed - the new one on a free node or the 
opportunistic one that is already doing work? Which one should be preempted?
2) the app suffered because its guaranteed containers got slowed down due to 
competition from opportunistic containers. This needs strong support for lower 
priority resource consumption for opportunistic containers.

IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that. The NM does not know if this would exceed 
guaranteed capacity and in total, a bunch of NMs making this choice locally can 
lead to excessive over-allocation of guaranteed resources.



> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-4468) Document the general ReservationSystem functionality, and the REST API

2016-01-04 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-4468:
--

Assignee: Carlo Curino

> Document the general ReservationSystem functionality, and the REST API
> --
>
> Key: YARN-4468
> URL: https://issues.apache.org/jira/browse/YARN-4468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4468.1.patch, YARN-4468.rest-only.patch
>
>
> This JIRA tracks effort to document the ReservationSystem functionality, and 
> the REST API access to it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4542) Cleanup AHS code and configuration

2016-01-04 Thread Junping Du (JIRA)

Junping Du created YARN-4542:


 Summary: Cleanup AHS code and configuration
 Key: YARN-4542
 URL: https://issues.apache.org/jira/browse/YARN-4542
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du


ATS (many versions so far) is designed to replace AHS. We should consider to 
cleanup AHS related configuration and code later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4412) Create ClusterManager to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-01-04 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4412:
--
Attachment: YARN-4412-yarn-2877.v1.patch

Attaching initial patch to solicit approach. It depends on YARN-2883 and 
YARN-2885. Test cases and general clean-up to follow

> Create ClusterManager to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch
>
>
> Introduce a Cluster Manager that aggregates Load and Policy information from 
> individual Node Managers and computes an ordered list of preferred Node 
> managers to be used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications

2016-01-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3154:
-
Target Version/s: 2.7.0, 2.6.4  (was: 2.7.0, 2.6.1)

> Should not upload partial logs for MR jobs or other "short-running' 
> applications 
> -
>
> Key: YARN-3154
> URL: https://issues.apache.org/jira/browse/YARN-3154
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
> YARN-3154.4.patch
>
>
> Currently, if we are running a MR job, and we do not set the log interval 
> properly, we will have their partial logs uploaded and then removed from the 
> local filesystem which is not right.
> We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082012#comment-15082012
 ] 

Junping Du commented on YARN-3154:
--

Hi [~xgong] and [~vinodkv], shall we consider to backport the fix to branch-2.6 
as well?

> Should not upload partial logs for MR jobs or other "short-running' 
> applications 
> -
>
> Key: YARN-3154
> URL: https://issues.apache.org/jira/browse/YARN-3154
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, 
> YARN-3154.4.patch
>
>
> Currently, if we are running a MR job, and we do not set the log interval 
> properly, we will have their partial logs uploaded and then removed from the 
> local filesystem which is not right.
> We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-04 Thread Ray Chiang (JIRA)

Ray Chiang created YARN-4541:


 Summary: Change log message in LocalizedResource#handle() to DEBUG
 Key: YARN-4541
 URL: https://issues.apache.org/jira/browse/YARN-4541
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.8.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor


This section of code can fill up a log fairly quickly.

   if (oldState != newState) {
LOG.info("Resource " + resourcePath + (localPath != null ?
  "(->" + localPath + ")": "") + " transitioned from " + oldState
+ " to " + newState);
   }




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082008#comment-15082008
 ] 

Junping Du commented on YARN-3089:
--

Hi [~eepayne] and [~jlowe], does this fix need to be cherry-picked to 
branch-2.6?

> LinuxContainerExecutor does not handle file arguments to deleteAsUser
> -
>
> Key: YARN-3089
> URL: https://issues.apache.org/jira/browse/YARN-3089
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt
>
>
> YARN-2468 added the deletion of individual logs that are aggregated, but this 
> fails to delete log files when the LCE is being used.  The LCE native 
> executable assumes the paths being passed are paths and the delete fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081826#comment-15081826
 ] 

Wangda Tan commented on YARN-4304:
--

1) REST response: amResourceLimit -> amLimit (it's a part of resources, so 
don't need to mention it's a resource)
2) REST response: for parent queue, queueCapacitiesByPartition contains 
maxAMLimitPercentage and resourceUsagesByPartition contains amResourceLimit. I 
would suggest to add a flag to include am-resource-related fields by 
queueCapacitiesByPartition/resourceUsagesByPartition only when a queue is leaf 
queue.
3) I'm not sure if it's possible that user's am limit could be greater than 
queue's am limit, we should cap user's am limit by queue's am limit.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-04 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4541:
-
Description: 
This section of code can fill up a log fairly quickly.

{quote}
   if (oldState != newState) {
LOG.info("Resource " + resourcePath + (localPath != null ?
  "(->" + localPath + ")": "") + " transitioned from " + oldState
+ " to " + newState);
   }
{quote}

  was:
This section of code can fill up a log fairly quickly.

   if (oldState != newState) {
LOG.info("Resource " + resourcePath + (localPath != null ?
  "(->" + localPath + ")": "") + " transitioned from " + oldState
+ " to " + newState);
   }



> Change log message in LocalizedResource#handle() to DEBUG
> -
>
> Key: YARN-4541
> URL: https://issues.apache.org/jira/browse/YARN-4541
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
> Attachments: YARN-4541.001.patch
>
>
> This section of code can fill up a log fairly quickly.
> {quote}
>if (oldState != newState) {
> LOG.info("Resource " + resourcePath + (localPath != null ?
>   "(->" + localPath + ")": "") + " transitioned from " + oldState
> + " to " + newState);
>}
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-04 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4541:
-
Description: 
This section of code can fill up a log fairly quickly.

{code}
   if (oldState != newState) {
LOG.info("Resource " + resourcePath + (localPath != null ?
  "(->" + localPath + ")": "") + " transitioned from " + oldState
+ " to " + newState);
   }
{code}

  was:
This section of code can fill up a log fairly quickly.

{quote}
   if (oldState != newState) {
LOG.info("Resource " + resourcePath + (localPath != null ?
  "(->" + localPath + ")": "") + " transitioned from " + oldState
+ " to " + newState);
   }
{quote}


> Change log message in LocalizedResource#handle() to DEBUG
> -
>
> Key: YARN-4541
> URL: https://issues.apache.org/jira/browse/YARN-4541
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
> Attachments: YARN-4541.001.patch
>
>
> This section of code can fill up a log fairly quickly.
> {code}
>if (oldState != newState) {
> LOG.info("Resource " + resourcePath + (localPath != null ?
>   "(->" + localPath + ")": "") + " transitioned from " + oldState
> + " to " + newState);
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082021#comment-15082021
 ] 

Junping Du commented on YARN-2902:
--

bq. However, YARN-3089 is unlikely to be backported to branch-2.6
I just pinged the author/review on that JIRA. Let's wait and see. Does that 
means if we need to pull that fix, the patch here need to be updated? If so, we 
should wait a while on that patch.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4412) Create ClusterManager to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-01-04 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4412:
--
Description: 
Introduce a Cluster Manager that aggregates Load and Policy information from 
individual Node Managers and computes an ordered list of preferred Node 
managers to be used as target Nodes for OPPORTUNISTIC container allocations. 

This list can be pushed out to the Node Manager (specifically the AMRMProxy 
running on the Node) via the Allocate Response. This will be used to make local 
Scheduling decisions

  was:
Introduce a Cluster Manager that aggregates Load and Policy information from 
individual Node Managers and computes an ordered list of preferred Node 
managers to be used as target Nodes for QUEUEABLE container allocations. 

This list can be pushed out to the Node Manager (specifically the AMRMProxy 
running on the Node) via the Allocate Response. This will be used to make local 
Scheduling decisions


> Create ClusterManager to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> Introduce a Cluster Manager that aggregates Load and Policy information from 
> individual Node Managers and computes an ordered list of preferred Node 
> managers to be used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-04 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4541:
-
Attachment: YARN-4541.001.patch

> Change log message in LocalizedResource#handle() to DEBUG
> -
>
> Key: YARN-4541
> URL: https://issues.apache.org/jira/browse/YARN-4541
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
> Attachments: YARN-4541.001.patch
>
>
> This section of code can fill up a log fairly quickly.
>if (oldState != newState) {
> LOG.info("Resource " + resourcePath + (localPath != null ?
>   "(->" + localPath + ")": "") + " transitioned from " + oldState
> + " to " + newState);
>}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082000#comment-15082000
 ] 

Junping Du commented on YARN-2902:
--

Thanks [~varun_saxena] for replying and delivering the patch!
bq. Also should I reopen this JIRA to run QA on this patch?
It depends on if we have many conflicts in merging with original patch. If 
answer is yes, then we'd better to reopen the JIRA and run Jenkins test.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4542) Cleanup AHS code and configuration

2016-01-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4542:
-
Description: ATS (Application Timeline Sever/Service, we already have many 
versions so far) has been designed and implemented to replace AHS for a long 
time. We should consider to cleanup AHS related configuration and code later.  
(was: ATS (many versions so far) is designed to replace AHS. We should consider 
to cleanup AHS related configuration and code later.)

> Cleanup AHS code and configuration
> --
>
> Key: YARN-4542
> URL: https://issues.apache.org/jira/browse/YARN-4542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>
> ATS (Application Timeline Sever/Service, we already have many versions so 
> far) has been designed and implemented to replace AHS for a long time. We 
> should consider to cleanup AHS related configuration and code later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-01-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082086#comment-15082086
 ] 

Naganarasimha G R commented on YARN-3692:
-

bq. We can still generate a useful diagnostic message automatically on the RM 
side when one is not provided, such as which user from which host issued the 
kill command.
+1 for this approach when no diagnostic message is given by the user/admin

> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082152#comment-15082152
 ] 

Li Lu commented on YARN-4224:
-

Thanks [~varun_saxena]. I briefly looked at the endpoints, generally looked 
fine. Let's decide entity type related issues this Wednesday, but I believe 
most of flow-related parts are fine. I'll start flow related UI work on top of 
this. (So if anyone got any concerns please send out your comment, thanks! ) 
One quick thing is about TimelineReaderContext. This class appears to have a 
lot of duplicated logic as TimelineCollectorContext, and serves as a similar 
role. Do we want to reorganize them or consolidate them? 

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4535) Fix checkstyle error in CapacityScheduler.java

2016-01-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082338#comment-15082338
 ] 

Naganarasimha G R commented on YARN-4535:
-

[~rohithsharma],
Test case failures are not related to the patch and its a trivial fix which 
doesn't require test case changes.

> Fix checkstyle error in CapacityScheduler.java
> --
>
> Key: YARN-4535
> URL: https://issues.apache.org/jira/browse/YARN-4535
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Naganarasimha G R
>Priority: Trivial
> Attachments: YARN-4535.v1.001.patch
>
>
> In the below code *CS#parseQueue*, expression can be simplified instead of 
> {{queue instanceof LeafQueue == true}} & {{queues.get(queueName) instanceof 
> LeafQueue == true}}
> {code}
>  if(queue instanceof LeafQueue == true && queues.containsKey(queueName)
>   && queues.get(queueName) instanceof LeafQueue == true) {
>   throw new IOException("Two leaf queues were named " + queueName
> + ". Leaf queue names must be distinct");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082168#comment-15082168
 ] 

Jun Gong commented on YARN-4536:


[~gu chi] Thanks for explaining it. Yes, we also came across the problem, and 
have applied the patch in YARN-4459, it works well now. I explained more in 
that issue's comments. Maybe you could help review and try it. Thanks.

> DelayedProcessKiller may not work under heavy workload
> --
>
> Key: YARN-4536
> URL: https://issues.apache.org/jira/browse/YARN-4536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When 
> some container got event of kill, it will get  {{SIGTERM}} , and then the 
> parent process exit, leave the container process to OS. This container 
> process need handle some shutdown events or some logic, but hardly can get 
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} 
> ,but the parent process which persisted as container pid no longer exist, so 
> the kill command can not reach the container process. This is how orphan 
> container process come.
> The orphan process do exit after some time, but the period can be very long, 
> and will make the OS status worse. As I observed, the period can be several 
> hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082332#comment-15082332
 ] 

Naganarasimha G R commented on YARN-4537:
-

Hi [~rohithsharma],
+1 for the approach,
Few nits in the patch :
* formatting has happened for the lines which are not modified also ?
* instead of {{!(p2 == null)}} we can use *p2 != null* ?


> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082344#comment-15082344
 ] 

Hadoop QA commented on YARN-4265:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 19s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
no findbugs output file 
(hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 9 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 292, now 300). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 25s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
no findbugs output file 
(hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/target/findbugsXml.xml) 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 32s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 32 new issues (was 544, now 544). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 35s {color} 
| {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} |

[jira] [Updated] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2016-01-04 Thread Carlo Curino (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4360:
---
Attachment: YARN-4360.5.patch

> Improve GreedyReservationAgent to support "early" allocations, and 
> performance improvements 
> 
>
> Key: YARN-4360
> URL: https://issues.apache.org/jira/browse/YARN-4360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4360.2.patch, YARN-4360.3.patch, YARN-4360.5.patch, 
> YARN-4360.patch
>
>
> The GreedyReservationAgent allocates "as late as possible". Per various 
> conversations, it seems useful to have a mirror behavior that allocates as 
> early as possible. Also in the process we leverage improvements from 
> YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which 
> significantly speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082157#comment-15082157
 ] 

Hadoop QA commented on YARN-4360:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 3m 54s 
{color} | {color:red} Docker failed to build yetus/hadoop:0ca8df7. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780436/YARN-4360.5.patch |
| JIRA Issue | YARN-4360 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10150/console |


This message was automatically generated.



> Improve GreedyReservationAgent to support "early" allocations, and 
> performance improvements 
> 
>
> Key: YARN-4360
> URL: https://issues.apache.org/jira/browse/YARN-4360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4360.2.patch, YARN-4360.3.patch, YARN-4360.5.patch, 
> YARN-4360.patch
>
>
> The GreedyReservationAgent allocates "as late as possible". Per various 
> conversations, it seems useful to have a mirror behavior that allocates as 
> early as possible. Also in the process we leverage improvements from 
> YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which 
> significantly speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4459) container-executor might kill process wrongly

2016-01-04 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082158#comment-15082158
 ] 

Jun Gong commented on YARN-4459:


Thanks [~Naganarasimha] for the review, very appreciate it!

{quote}
IIUC existing code checks whether container process has created any sub process 
then kill all the process, else if its a single process then i presume 
kill(-pid,0) will return -1 then it tries to kill only the container process id 
only. Can you confirm this by testing?
{quote}
I do not have a good method to add a test case for it now. I will try to 
explain it by following test cases:
1. If container's parent process does not exist and its child process does 
exist, existing code works well.
{code}
root@bd74d89a2294:/$ cat test.sh 
sleep  &
PID_FOR_SLEEP=$!
PGID_FOR_SLEEP=$(ps -p $PID_FOR_SLEEP -o pgid=)
echo "PID for 'sleep ' : $PID_FOR_SLEEP, its pgid : $PGID_FOR_SLEEP"
root@bd74d89a2294:/$ ./test.sh 
PID for 'sleep ' : 26877, its pgid : 26876
root@bd74d89a2294:/$ ps -ef | grep sleep
root 26877 1  0 08:48 pts/000:00:00 sleep 
root 26880 26797  0 08:48 pts/000:00:00 grep sleep
root@bd74d89a2294:/$ kill -0 -26876
root@bd74d89a2294:/$ kill -15 -26876
root@bd74d89a2294:/$ ps -ef | grep sleep
root 26882 26797  0 08:48 pts/000:00:00 grep sleep
{code}

2. If container's parent process does not exist and its child process does not 
exist either, existing code will kill process wrongly.
{code}
root@bd74d89a2294:/$ cat test.sh 
sleep 2 &
PID_FOR_SLEEP=$!
PGID_FOR_SLEEP=$(ps -p $PID_FOR_SLEEP -o pgid=)
echo "PID for 'sleep 2' : $PID_FOR_SLEEP, its pgid : $PGID_FOR_SLEEP"
root@bd74d89a2294:/$ ./test.sh 
PID for 'sleep 2' : 26890, its pgid : 26889
root@bd74d89a2294:/$ ps -ef | grep sleep
root 26893 26797  0 08:56 pts/000:00:00 grep sleep
root@bd74d89a2294:/$ kill -0 26889
-bash: kill: (26889) - No such process
{code}
Then we check existing code in container-executor.c for the above case:
{code}
  if (kill(-pid,0) < 0) {
if (kill(pid, 0) < 0) {
  if (errno == ESRCH) {
return INVALID_CONTAINER_PID;
  }
  fprintf(LOGFILE, "Error signalling container %d with %d - %s\n",
  pid, sig, strerror(errno));
  return -1;
} else {
  has_group = 0;
}
  }

  if (kill((has_group ? -1 : 1) * pid, sig) < 0) {
{code}
*kill(-pid,0)* will return -1. If *pid* is reused for a new process(suppose 
that the container has survived for a long time, pid recycle occurs, then this 
pid might be reused again), *kill(pid, 0)* will return 0, then *has_group* will 
be set to 0, and the code *kill((has_group ? -1 : 1) * pid, sig)* will try to 
kill *pid* with *sig*. This is the problem.

> container-executor might kill process wrongly
> -
>
> Key: YARN-4459
> URL: https://issues.apache.org/jira/browse/YARN-4459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4459.01.patch, YARN-4459.02.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3678) DelayedProcessKiller may kill other process other than container

2016-01-04 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082201#comment-15082201
 ] 

gu-chi commented on YARN-3678:
--

same issue as confirmed with [~hex108]

> DelayedProcessKiller may kill other process other than container
> 
>
> Key: YARN-3678
> URL: https://issues.apache.org/jira/browse/YARN-3678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: gu-chi
>Priority: Critical
>
> Suppose one container finished, then it will do clean up, the PID file still 
> exist and will trigger once singalContainer, this will kill the process with 
> the pid in PID file, but as container already finished, so this PID may be 
> occupied by other process, this may cause serious issue.
> As I know, my NM was killed unexpectedly, what I described can be the cause. 
> Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-04 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082312#comment-15082312
 ] 

Subru Krishnan commented on YARN-3870:
--

Regarding the ID, I am in principle fine with asking the AM to set it. We do 
have the option of reusing the _responseID_ of *AllocateRequest* which both the 
RM and AM maintain today. It would be good to also link the _responseID_ to the 
actual allocated container in *AllocateResponse* as this is a useful hint for 
the AMs. In fact has been requested by [~markus.weimer] to simplify certain 
bookkeeping for the [REEF | http://reef.apache.org/ ] AM.

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4232) TopCLI console support for HA mode

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082262#comment-15082262
 ] 

Hadoop QA commented on YARN-4232:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 40s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 25 new checkstyle issues in root (total 
was 350, now 374). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
27s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 7s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 
generated 35 new issues (was 100, now 100). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 7s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 50 new issues (was 100, now 100). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 16s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 2, now 2). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s 
{color} |

[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082301#comment-15082301
 ] 

Li Lu commented on YARN-4224:
-

Ah one more question: in the original design, /flows will return a list of 
flows with latest activities on the cluster. Shall we keep this endpoint (as I 
noticed, /flows endpoint is not touched in this patch), or attach it to 
somewhere else in our hierarchy? Thanks! 

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gu-chi resolved YARN-4536.
--
Resolution: Not A Problem

As analyzed further, this is introduced by some custom modification, sorry if 
bother.

> DelayedProcessKiller may not work under heavy workload
> --
>
> Key: YARN-4536
> URL: https://issues.apache.org/jira/browse/YARN-4536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When 
> some container got event of kill, it will get  {{SIGTERM}} , and then the 
> parent process exit, leave the container process to OS. This container 
> process need handle some shutdown events or some logic, but hardly can get 
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} 
> ,but the parent process which persisted as container pid no longer exist, so 
> the kill command can not reach the container process. This is how orphan 
> container process come.
> The orphan process do exit after some time, but the period can be very long, 
> and will make the OS status worse. As I observed, the period can be several 
> hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-04 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082305#comment-15082305
 ] 

Subru Krishnan commented on YARN-1011:
--

[~kasha], I had an offline discussion with [~curino] and [~chris.douglas] 
regarding auto promotion by NM. To be aligned with YARN-2877, we feel it will 
be good if NM can express it's preference to the RM and let the RM make the 
decision as only it can ensure the global invariants based on the current state 
of the cluster. The preference can be based on whether the opportunistic 
container has been started or not, it's resources have been localized or not, 
how long it has been running, how much progress it has made, etc.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4542) Cleanup AHS code and configuration

2016-01-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082322#comment-15082322
 ] 

Naganarasimha G R commented on YARN-4542:
-

Shall i take this up, [~djp] ?

> Cleanup AHS code and configuration
> --
>
> Key: YARN-4542
> URL: https://issues.apache.org/jira/browse/YARN-4542
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>
> ATS (Application Timeline Sever/Service, we already have many versions so 
> far) has been designed and implemented to replace AHS for a long time. We 
> should consider to cleanup AHS related configuration and code later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4224:
---
Attachment: YARN-4224-feature-YARN-2928.wip.03.patch

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081122#comment-15081122
 ] 

Jun Gong commented on YARN-4536:


hi [~gu chi] thanks for reporting the issue. 

{quote}but the parent process which persisted as container pid no longer exist, 
so the kill command can not reach the container process.{quote}
Although parent process does not exist, corresponding process group does exist, 
then *SIGKILL* will be delivered to the process group, so *SIGKILL* could reach 
the container's rest processes. Could you explain it more? Thanks.

> DelayedProcessKiller may not work under heavy workload
> --
>
> Key: YARN-4536
> URL: https://issues.apache.org/jira/browse/YARN-4536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When 
> some container got event of kill, it will get  {{SIGTERM}} , and then the 
> parent process exit, leave the container process to OS. This container 
> process need handle some shutdown events or some logic, but hardly can get 
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} 
> ,but the parent process which persisted as container pid no longer exist, so 
> the kill command can not reach the container process. This is how orphan 
> container process come.
> The orphan process do exit after some time, but the period can be very long, 
> and will make the OS status worse. As I observed, the period can be several 
> hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4537:

Attachment: 0002-YARN-4537.patch

updated the patch fixing comments

> Pull out priority comparison from fifocomparator and use compound comparator 
> for FifoOrdering policy
> 
>
> Key: YARN-4537
> URL: https://issues.apache.org/jira/browse/YARN-4537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4537.patch, 0002-YARN-4537.patch
>
>
> Currently, priority comparison is integrated with FifoComparator. There 
> should be a separate comparator defined for priority comparison so that down 
> the line if any new ordering policy wants to integrate priority, they can use 
> compound comparator where priority will be high preference. 
> The following changes are expected to be done as part of this JIRA
> # Pull out priority comparison from FifoComparator
> # Define new priority comparator
> # Use compound comparator for FifoOrderingPolicy. Oder of preference is   
> Priority,FIFO



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4536) DelayedProcessKiller may not work under heavy workload

2016-01-04 Thread gu-chi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081159#comment-15081159
 ] 

gu-chi commented on YARN-4536:
--

Thanks for reply, the process group I not realize, this seems introduced by 
myself, I add a condition of check if container-executor process exist as I 
once meet with YARN-3678, in my logic, if parent process not belong to this 
container, will not signal kill, I saw you also faced same issue, is your patch 
can deal with that scenario and also will not introduce this issue?

> DelayedProcessKiller may not work under heavy workload
> --
>
> Key: YARN-4536
> URL: https://issues.apache.org/jira/browse/YARN-4536
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When 
> some container got event of kill, it will get  {{SIGTERM}} , and then the 
> parent process exit, leave the container process to OS. This container 
> process need handle some shutdown events or some logic, but hardly can get 
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} 
> ,but the parent process which persisted as container pid no longer exist, so 
> the kill command can not reach the container process. This is how orphan 
> container process come.
> The orphan process do exit after some time, but the period can be very long, 
> and will make the OS status worse. As I observed, the period can be several 
> hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081130#comment-15081130
 ] 

Varun Saxena commented on YARN-2902:


Yes this issue should exist in 2.6 too.
I can help you with backporting it.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.08.patch, YARN-2902.09.patch, 
> YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081150#comment-15081150
 ] 

Junping Du commented on YARN-3893:
--

Thanks [~rohithsharma]!

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081194#comment-15081194
 ] 

Sunil G commented on YARN-3849:
---

Yes. [~djp] I will provide a patch for 2.6 now. 

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4224:
---
Summary: Support fetching entities by UID and change the REST interface to 
conform to current REST APIs' in YARN  (was: Change the ATSv2 reader side REST 
interface to conform to current REST APIs' in YARN)

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081156#comment-15081156
 ] 

Hadoop QA commented on YARN-4479:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 32s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 357, now 353). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 19s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 54s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 56s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 215m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   |

[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081157#comment-15081157
 ] 

Hadoop QA commented on YARN-4224:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
48s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
34s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
53s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 44s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 6 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 55, now 57). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 4s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:7c86163 |
| JIRA Patch URL |

[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081186#comment-15081186
 ] 

Junping Du commented on YARN-3849:
--

Mark this JIRA target to 2.6.4 per discussion above.

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues

2016-01-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3849:
-
Target Version/s: 2.6.4

> Too much of preemption activity causing continuos killing of containers 
> across queues
> -
>
> Key: YARN-3849
> URL: https://issues.apache.org/jira/browse/YARN-3849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, 
> 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch
>
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant 
> Resource policy is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking 
> preemption in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that 
> all containers other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free 
> space. But there are some updated demand from the app in QueueA which lost 
> its containers earlier, and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the 
> apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4535) Fix checkstyle error in CapacityScheduler.java

2016-01-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082568#comment-15082568
 ] 

Naganarasimha G R commented on YARN-4535:
-

thanks [~rohithsharma] for review and commit !

> Fix checkstyle error in CapacityScheduler.java
> --
>
> Key: YARN-4535
> URL: https://issues.apache.org/jira/browse/YARN-4535
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Naganarasimha G R
>Priority: Trivial
> Fix For: 2.9.0
>
> Attachments: YARN-4535.v1.001.patch
>
>
> In the below code *CS#parseQueue*, expression can be simplified instead of 
> {{queue instanceof LeafQueue == true}} & {{queues.get(queueName) instanceof 
> LeafQueue == true}}
> {code}
>  if(queue instanceof LeafQueue == true && queues.containsKey(queueName)
>   && queues.get(queueName) instanceof LeafQueue == true) {
>   throw new IOException("Two leaf queues were named " + queueName
> + ". Leaf queue names must be distinct");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-04 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3446:

Attachment: YARN-3446.004.patch

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4535) Fix checkstyle error in CapacityScheduler.java

2016-01-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082530#comment-15082530
 ] 

Hudson commented on YARN-4535:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9048 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9048/])
YARN-4535. Fix checkstyle error in CapacityScheduler.java (Naganarasimha 
(rohithsharmaks: rev 6da6d87872de518bb2583f65c9595f2090c855d7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Fix checkstyle error in CapacityScheduler.java
> --
>
> Key: YARN-4535
> URL: https://issues.apache.org/jira/browse/YARN-4535
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Assignee: Naganarasimha G R
>Priority: Trivial
> Fix For: 2.9.0
>
> Attachments: YARN-4535.v1.001.patch
>
>
> In the below code *CS#parseQueue*, expression can be simplified instead of 
> {{queue instanceof LeafQueue == true}} & {{queues.get(queueName) instanceof 
> LeafQueue == true}}
> {code}
>  if(queue instanceof LeafQueue == true && queues.containsKey(queueName)
>   && queues.get(queueName) instanceof LeafQueue == true) {
>   throw new IOException("Two leaf queues were named " + queueName
> + ". Leaf queue names must be distinct");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2016-01-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082460#comment-15082460
 ] 

Varun Saxena commented on YARN-2902:


bq. Does that means if we need to pull that fix, the patch here need to be 
updated?
Yes, the patch would have to be updated with changes in container-executor.c

Yeah there were a few conflicts. Will reopen and run QA after decision on 
YARN-3089.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.7.2
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-04 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082564#comment-15082564
 ] 

Bibin A Chundatt commented on YARN-4538:


[~sunilg]/[~rohithsharma]

Thank you for looking into the issue. I am not sure its the same issue. 
Currently the issue looks like in {{QueueMetrics#_decrPendingResources}} and 
{{QueueMetrics#_incrPendingResources}}

{noformat}
  private void _decrPendingResources(int containers, Resource res) {
// if #container = 0, means change container resource
pendingContainers.decr(containers);
pendingMB.decr(res.getMemory() * Math.max(containers, 1));
pendingVCores.decr(res.getVirtualCores() * Math.max(containers, 1));
  }

  private void _incrPendingResources(int containers, Resource res) {
pendingContainers.incr(containers);
pendingMB.incr(res.getMemory() * containers);
pendingVCores.incr(res.getVirtualCores() * containers);
  }
{noformat}

For increase and decrease the logic looks different.

{noformat}
[{Priority: 20, Capability: , # Containers: 1, Location: 
*, Relax Locality: true, Node Label Expression: }]  
[{Priority: 20, Capability: , # Containers: 0, Location: 
*, Relax Locality: true, Node Label Expression: }]  
{noformat}

The resource request in {{AppSchedulingInfo#updateResourceRequests}} is as 
above. And causes pendingMB and PendingVcore to be negative.
When ever allocation request contains capacity as non zero value and 
container=0 the the pending resource calculation will go wrong.

Looks related to YARN-1651. 
I will sync {{QueueMetrics#_incrPendingResources}} also same as 
{{QueueMetrics#_decrPendingResources}} and upload a patch soon

Thoughts??

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-04 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4538:
---
Priority: Critical  (was: Major)

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-04 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082601#comment-15082601
 ] 

zhihai xu commented on YARN-3446:
-

thanks for the review! Just updated the patch at YARN-3446.004.patch.

> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread MENG DING (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081267#comment-15081267
 ] 

MENG DING commented on YARN-4528:
-

Hi, [~sandflee]

I am not quite sure about the benefit of directly decreasing resource in NM 
(point #2 in your comment). The targetResource is already being persisted in NM 
state store for NM recovery, and RM does not need to check the status of the NM 
decrease anyway. 
{code}
// Persist container resource change for recovery
this.context.getNMStateStore().storeContainerResourceChanged(
containerId, targetResource);
{code}


> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-04 Thread tangshangwen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081292#comment-15081292
 ] 

tangshangwen commented on YARN-4539:


I think asyncDispatcher should check whether the null before closing
{code:title=CommonNodeLabelsManager.java|borderStyle=solid}
// for UT purpose
  protected void stopDispatcher() {
AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;  
asyncDispatcher.stop();
  }
{code}

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2016-01-04 Thread Lei Guo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081242#comment-15081242
 ] 

Lei Guo commented on YARN-3870:
---

I am not against to combine this JIRA and YARN-371, There is common ground 
between these two JIRAs. And more likely the final technical solution will be 
single solution to cover both, though it's not necessary. Maybe we can view 
YARN-371 as a technical speculation and YARN-3870 as one related use case (if 
YARN-371 is resolved, YARN-3870 should be covered). 

>From another angle, YARN-3870 could be resolved via approaches without ID. The 
>scheduling is more care about the current snapshot of resource requests from 
>applications. It's not mandatory to have the ID, as long as the snapshot can 
>provide detailed resource request information, scheduler can do fine 
>scheduling. The ID will mainly help to prevent/handle issues from asynchronous 
>protocol.

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>Assignee: Karthik Kambatla
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-04 Thread tangshangwen (JIRA)

tangshangwen created YARN-4539:
--

 Summary: CommonNodeLabelsManager throw NullPointerException when 
the fairScheduler init failed
 Key: YARN-4539
 URL: https://issues.apache.org/jira/browse/YARN-4539
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: tangshangwen
Assignee: tangshangwen


When the scheduler initialization failed and RM stop compositeService cause the 
CommonNodeLabelsManager throw NullPointerException.
{noformat}
2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
failed in state INITED; cause: java.io.IOException: Failed to initialize 
FairScheduler
java.io.IOException: Failed to initialize FairScheduler
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)

2016-01-04 22:19:52,193 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping ResourceManager metrics system...
2016-01-04 22:19:52,194 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ResourceManager metrics system stopped.
2016-01-04 22:19:52,194 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
ResourceManager metrics system shutdown complete.
2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
AsyncDispatcher is draining to stop, igonring any new events.
2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state 
STOPPED; cause: java.lang.NullPointerException
java.lang.NullPointerException
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN

2016-01-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081294#comment-15081294
 ] 

Varun Saxena commented on YARN-4224:


The WIP patch addresses few of the comments above.

Following points need to be decided upon. We can do so in weekly meeting.
# Entity type to be optional or mandatory param.
# UID key and delimiter to be configurable or just document it.

I have to update javadoc. Will update based on the decision on above points.

> Support fetching entities by UID and change the REST interface to conform to 
> current REST APIs' in YARN
> ---
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch, 
> YARN-4224-feature-YARN-2928.wip.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081263#comment-15081263
 ] 

Hadoop QA commented on YARN-4537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 20s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 16s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 5s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 43s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 212m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   |

[jira] [Updated] (YARN-4539) CommonNodeLabelsManager throw NullPointerException when the fairScheduler init failed

2016-01-04 Thread tangshangwen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-4539:
---
Attachment: YARN-4539.1.patch

> CommonNodeLabelsManager throw NullPointerException when the fairScheduler 
> init failed
> -
>
> Key: YARN-4539
> URL: https://issues.apache.org/jira/browse/YARN-4539
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
> Attachments: YARN-4539.1.patch
>
>
> When the scheduler initialization failed and RM stop compositeService cause 
> the CommonNodeLabelsManager throw NullPointerException.
> {noformat}
> 2016-01-04 22:19:52,190 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
> failed in state INITED; cause: java.io.IOException: Failed to initialize 
> FairScheduler
> java.io.IOException: Failed to initialize FairScheduler
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1377)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1394)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> 
> 2016-01-04 22:19:52,193 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager 
> metrics system...
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system stopped.
> 2016-01-04 22:19:52,194 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics 
> system shutdown complete.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher is draining to stop, igonring any new events.
> 2016-01-04 22:19:52,194 INFO org.apache.hadoop.service.AbstractService: 
> Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
> state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081322#comment-15081322
 ] 

sandflee commented on YARN-4528:


HI, [~mding],  container decrease msg is passed like container complete msg 
passed from RM to AM. so a successfully nodeHeartBeat must ensure that 
container decrease msg is persisted in NM state store.

{code:title=RMAppAttemptImpl.java # pullJustFinishContainers}
  // A new allocate means the AM received the previously sent
  // finishedContainers. We can ack this to NM now
  sendFinishedContainersToNM();
{code}

> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-04 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1011:
---
Attachment: yarn-1011-design-v1.pdf

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4537) Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081331#comment-15081331
 ] 

Hadoop QA commented on YARN-4537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 25s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 31s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 51s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 53s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 198m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem ||

[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2016-01-04 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081353#comment-15081353
 ] 

Carlo Curino commented on YARN-4195:


[~subru] thanks for the review. I agree with most of what you say.

 * The RMNodeLabel was more of a future-proofing. That class contains more info 
regarding actual nodes carrying that the label etc., which in the future we 
might want to leverage. At the same time, other ongoing work point at using the 
RLESparseResourceAllocation to represent the amount of resources available (to 
allow to represent time-varying plan capacity). As such I second your 
suggestion to switch to Resource (or maybe RLE directly).
 * We need to wait for YARN-4359 and YARN-4360 as well to settle, but given 
that work I agree we could remove a few calls from PlanView. (I will add the 
dependencies)
 * As I mentioned in person, the initial attempt (which I am not very happy 
with) was to follow what happens in the rest of the CS, i.e., the old method 
signature (without any label specified) implicitly refers to what should happen 
for the NO_LABEL partition. I think this makes sense only temporarily, and for 
legacy purposes. Since in the ReservationSystem we have less of a legacy usage, 
I agree with your proposal to switch to explicit use of labels at all times, 
where one can specify ALL or NO_LABEL explicitly. Regarding the semantics of a 
null label  expression, I am leaning towards the interpretation of it as "no 
constraints" and therefore meaning of a "any label is fine" than the implicit 
semantics of NO_LABEL only carried over from CS. Thoughts? 

ACK on all the minor comments.

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2016-01-04 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081359#comment-15081359
 ] 

Karthik Kambatla commented on YARN-1856:


Was catching up on YARN-3 (the JIRA that added cgroups) to see why we decided 
to not use it for enforcing memory. [~bikassaha] has some [valid 
points|https://issues.apache.org/jira/browse/YARN-3?focusedCommentId=13414567=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13414567]
 on not letting the kernel (through cgroups) kill processes that go over their 
allocated limits.

To get the best of both worlds: I feel we should disable oom_control so the 
processes are paused but not killed. Thoughts? 

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Fix For: 2.9.0
>
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4232) TopCLI console support for HA mode

2016-01-04 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: (was: 0002-YARN-4232.patch)

> TopCLI console support for HA mode
> --
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4232) TopCLI console support for HA mode

2016-01-04 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4232:
---
Attachment: 0002-YARN-4232.patch

[~djp] /[~rohithsharma]

Attaching patch again to trigger CI, Could you please review patch attached
# Cluster info proto added to get cluster start time in Yarn CLI
# Earlier http rest api was used to get RMstarttime which will cause failure 
secure  mode and HA

> TopCLI console support for HA mode
> --
>
> Key: YARN-4232
> URL: https://issues.apache.org/jira/browse/YARN-4232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4232.patch, 0002-YARN-4232.patch
>
>
> *Steps to reproduce*
> Start Top command in YARN in HA mode
> ./yarn top
> {noformat}
> usage: yarn top
>  -cols  Number of columns on the terminal
>  -delay The refresh delay(in seconds), default is 3 seconds
>  -help   Print usage; for help while the tool is running press 'h'
>  + Enter
>  -queuesComma separated list of queues to restrict applications
>  -rows  Number of rows on the terminal
>  -types Comma separated list of types to restrict applications,
>  case sensitive(though the display is lower case)
>  -users Comma separated list of users to restrict applications
> {noformat}
> Execute *for help while the tool is running press 'h'  + Enter* while top 
> tool is running
> Exception is thrown in console continuously
> {noformat}
> 15/10/07 14:59:28 ERROR cli.TopCLI: Could not fetch RM start time
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> at sun.net.www.http.HttpClient.(HttpClient.java:211)
> at sun.net.www.http.HttpClient.New(HttpClient.java:308)
> at sun.net.www.http.HttpClient.New(HttpClient.java:326)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
> at 
> org.apache.hadoop.yarn.client.cli.TopCLI.getRMStartTime(TopCLI.java:742)
> at org.apache.hadoop.yarn.client.cli.TopCLI.run(TopCLI.java:467)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.yarn.client.cli.TopCLI.main(TopCLI.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4541) Change log message in LocalizedResource#handle() to DEBUG

2016-01-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081945#comment-15081945
 ] 

Hadoop QA commented on YARN-4541:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 9m 1s {color} 
| {color:red} Docker failed to build yetus/hadoop:0ca8df7. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12780390/YARN-4541.001.patch |
| JIRA Issue | YARN-4541 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10148/console |


This message was automatically generated.



> Change log message in LocalizedResource#handle() to DEBUG
> -
>
> Key: YARN-4541
> URL: https://issues.apache.org/jira/browse/YARN-4541
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
> Attachments: YARN-4541.001.patch
>
>
> This section of code can fill up a log fairly quickly.
> {code}
>if (oldState != newState) {
> LOG.info("Resource " + resourcePath + (localPath != null ?
>   "(->" + localPath + ")": "") + " transitioned from " + oldState
> + " to " + newState);
>}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4528) decreaseContainer Message maybe lost if NM restart

2016-01-04 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081969#comment-15081969
 ] 

sandflee commented on YARN-4528:


thanks [~mding], yes this could happen, but rarely. should this affect the 
design?

> decreaseContainer Message maybe lost if NM restart
> --
>
> Key: YARN-4528
> URL: https://issues.apache.org/jira/browse/YARN-4528
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: YARN-4528.01.patch
>
>
> we may pending the container decrease msg util next heartbeat. or checks the 
> resource with rmContainer when node register.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

2016-01-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081472#comment-15081472
 ] 

Junping Du commented on YARN-4265:
--

Thanks [~gtCarrera9] for clarification. Assume Jason is fine with going on with 
this patch, I quickly go through the v3 patch. 
with following comments so far (I haven't finished my review yet as patch is 
pretty big):

In YarnConfiguration.java,
{code}
TIMELINE_SERVICE_ENTITYGROUP_FS_STORE_SCAN_INTERVAL_SECONDS_DEFAULT = 60;
{code}
I noticed that we are setting 1 minutes as default scan interval but original 
patch in HDFS-3942 is 5 minutes. Why shall we do any update here? The same 
question on "app-cache-size", the default value in HDFS-3942 is 5 but here is 
10. Any reason to update the value?

In yarn-default.xml,
{code}
+DFS path to store active application’s timeline 
data
...
+DFS path to store done application’s timeline 
data
{code}
DFS is very old name, use HDFS instead to be more clear.

Why we don't have any default value specified in property of 
"yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes"?

In hadoop-yarn-server-timeline-pluginstorage/pom.xml,


For EmptyTimelineEntityGroupPlugin.java, why we need this class? I didn't see 
any help provided even in tests. We should remove it if useless.

In EntityCacheItem.java,
We should have a description for this class in Javadoc.

Can we optimize the synchronization logic here? Like in synchronized method 
refreshCache, we are intialize/start/stop TimelineDataManager (and 
MemoryTimelineStore) which is quite expensive and unnecessary to block other 
synchronized operations. Shall we move these operations out of synchronized 
block?

{code}
+  LOG.warn("Error closing datamanager", e);
{code}
I think we are closing store here instead of datamanager. Isn't it?

{code}
+  public boolean needRefresh() {
+//TODO: make a config for cache freshness
+return (Time.monotonicNow() - lastRefresh > 1);
+  }
{code}
Does refresh interval here need to do any coordination with scan interval 
specificed in 
"yarn.timeline-service.entity-group-fs-store.scan-interval-seconds"?

In EntityGroupFSTimelineStore.java,

{code}
+  if (appState != AppState.UNKNOWN) {
+appLogs = new AppLogs(applicationId, appDirPath, appState);
+LOG.debug("Create and try to add new appLogs to appIdLogMap for {}",
+applicationId);
+AppLogs oldAppLogs = appIdLogMap.putIfAbsent(applicationId, appLogs);
+if (oldAppLogs != null) {
+  appLogs = oldAppLogs;
+}
+  }
{code}
This logic is very similiar with method of getAndSetActiveLog(). Can we 
consolidate them together?

If parseSummaryLogs() is synchronized, it seems getSummaryLogs() should be 
synchronized too or the getter will get stale(half-done) result.

Still checking if other multi-threads issues. More comments will come soon.

> Provide new timeline plugin storage to support fine-grained entity caching
> --
>
> Key: YARN-4265
> URL: https://issues.apache.org/jira/browse/YARN-4265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4265-trunk.001.patch, 
> YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new 
> plugin timeline store. The store may have similar behavior as the 
> EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id 
> granularity, instead of application id granularity. Let's have this storage 
> as a standalone one, instead of updating EntityFileTimelineStore, to keep the 
> existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4540) Yarn property "yarn.nodemanager.localizer.cache.cleanup.interval-ms" not working as expected

2016-01-04 Thread Mark S (JIRA)

Mark S created YARN-4540:


 Summary: Yarn property 
"yarn.nodemanager.localizer.cache.cleanup.interval-ms" not working as expected
 Key: YARN-4540
 URL: https://issues.apache.org/jira/browse/YARN-4540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
 Environment: Ambari – Version 2.1.2.1
YARN - Version 2.7.1.2.3
HDP - Version 2.3.2.0
Reporter: Mark S
Priority: Minor


I manually add specific YARN configuration to reduce my YARN cache size, 
however it does not get updated for the interval specified by:
{code}
yarn.nodemanager.localizer.cache.cleanup.interval-ms
{code}
I did notice that eventually my server did reduce my YARN cache size over the 
weekend.


h5. Set Values (Should update size in 1 minute):
{code}
yarn.nodemanager.localizer.cache.target-size-mb=4096
yarn.nodemanager.localizer.cache.cleanup.interval-ms=6
{code}

h5. Default values (Should update size in 10 minutes):
{code}
yarn.nodemanager.localizer.cache.target-size-mb=10240
yarn.nodemanager.localizer.cache.cleanup.interval-ms=60
{code}

h5.  Confirming YARN cache size
{code}
date && du -m /hadoop/yarn | sort -nr | head -n 20
#date && du -m . | sort -nr | head -n 20
{code}

h5. See also:
* 
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
* http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4540) Yarn property "yarn.nodemanager.localizer.cache.cleanup.interval-ms" not working as expected

2016-01-04 Thread Mark S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark S updated YARN-4540:
-
Component/s: yarn

> Yarn property "yarn.nodemanager.localizer.cache.cleanup.interval-ms" not 
> working as expected
> 
>
> Key: YARN-4540
> URL: https://issues.apache.org/jira/browse/YARN-4540
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: Ambari – Version 2.1.2.1
> YARN - Version 2.7.1.2.3
> HDP - Version 2.3.2.0
>Reporter: Mark S
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I manually add specific YARN configuration to reduce my YARN cache size, 
> however it does not get updated for the interval specified by:
> {code}
> yarn.nodemanager.localizer.cache.cleanup.interval-ms
> {code}
> I did notice that eventually my server did reduce my YARN cache size over the 
> weekend.
> h5. Set Values (Should update size in 1 minute):
> {code}
> yarn.nodemanager.localizer.cache.target-size-mb=4096
> yarn.nodemanager.localizer.cache.cleanup.interval-ms=6
> {code}
> h5. Default values (Should update size in 10 minutes):
> {code}
> yarn.nodemanager.localizer.cache.target-size-mb=10240
> yarn.nodemanager.localizer.cache.cleanup.interval-ms=60
> {code}
> h5.  Confirming YARN cache size
> {code}
> date && du -m /hadoop/yarn | sort -nr | head -n 20
> #date && du -m . | sort -nr | head -n 20
> {code}
> h5. See also:
> * 
> https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> * http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 108 matches

Mail list logo