[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394149#comment-14394149
 ] 

Naganarasimha G R commented on YARN-2729:
-


bq. Revisted interval, I think it's better to make it to be provider 
configuration instead of script-provider-only configuration. Since 
config/script will share it (I remember I have some back-and-forth opinions 
here).
:) agree, i dont mind redoing, as long as its for better reason and i was 
expecting for changes here anyway.
For other comments on configuration will get it done, 

bq. I feel like ScriptBased and ConfigBased can share some implementations, 
they will all init a time task, get interval and run, check timeout 
(meaningless for config-based), etc. Can you make an abstract class and 
inherited by ScriptBased?
I can do this (which i feel is correct), but if we do this then it might not be 
possible to generalize much NodeHealthSCriptRunner and 
ScriptBasedNodeLabelsProvider, which i feel should be ok

bq. checkAndThrowLabelName should be called in NodeStatusUpdaterImpl
In a way it would be better in NodeStatusUpdaterImpl as we support external 
class to be a provider, but earlier thought it would not be good for additional 
checks as part of heart beat flow 

bq. label need to be trim() when called checkAndThrowLabelName(...)
Not required as checkAndThrowLabelName takes care of it, but missing test case 
will add it for NodeStatusUpdaterImpl
Other issues will rework in next patch

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v8.patch

Upload v8 patch to address minor comments for log in TimelineClientImpl.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394059#comment-14394059
 ] 

Naganarasimha G R commented on YARN-3390:
-

Thanks for the feedback [~zjshen] & [~sjlee0],
bq. either pass in the context per call or have a map of app id to context. I 
would favor the latter approach because it'd be easier on the perspective of 
callers of putEntities().
I too agree it will be easier easier on the perspective of callers of 
putEntities() but if we favor for map of {{app id to context}} 
* implicit assumption would be that {{putEntities(TimelineEntities ) }} will be 
for same appId(/will have have the same context)
* TimelineEntities as such do not have appID explicitly, so planning to modify 
{{TimelineCollector.getTimelineEntityContext()}} to  
{{TimelineCollector.getTimelineEntityContext(TimelineEntity.Identifier id)}} 
and subclasses of TimelineCollector can take care of mapping the Id to the  
Context (via AppId) if required.
* code of  {{putEntities(TimelineEntities)}}  would look something like 
{code}
Iterator iterator = entities.getEntities().iterator();
TimelineEntity next = (iterator.hasNext())?iterator.next():null;
if(null!=next) {
  
TimelineCollectorContext context = 
getTimelineEntityContext(next.getIdentifier());
return writer.write(context.getClusterId(), context.getUserId(),
context.getFlowId(), context.getFlowRunId(), context.getAppId(),
entities);
}
{code}

If its ok then shall i work on it  ?



> RMTimelineCollector should have the context info of each app
> 
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394052#comment-14394052
 ] 

Junping Du commented on YARN-3334:
--

Thanks [~zjshen] and [~sjlee0] for comments!
bq. If so, I suggest combining the two massages together, and record a 
error-level log (the first message is actually useless, if we always report the 
second one).
That sounds OK. Will update a quick fix.

bq. However, I do worry about the size of the map produced in the response in 
ResourceTrackerService. It can be potentially quite large every time and has a 
potential impact on many things as it is part of the NM heartbeat handling. 
It's OK for now, but we should try to address it sooner than later.
Just filed YARN-3445 to track this issue. This is also needed in gracefully 
decommission (YARN-914) - decommissioning node can be terminated earlier by RM 
if no running apps.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3445) NM notify RM on running Apps in NM-RM heartbeat

2015-04-02 Thread Junping Du (JIRA)
Junping Du created YARN-3445:


 Summary: NM notify RM on running Apps in NM-RM heartbeat
 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du


Per discussion in YARN-3334, we need filter out unnecessary collectors info 
from RM in heartbeat response. Our propose is to add additional field for 
running apps in NM heartbeat request, so RM only send collectors for local 
running apps back. This is also needed in YARN-914 (graceful decommission) that 
if no running apps in NM which is in decommissioning stage, it will get 
decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394003#comment-14394003
 ] 

Hadoop QA commented on YARN-3443:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709150/YARN-3443.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1150 javac 
compiler warnings (more than the trunk's current 1148 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7210//console

This message is automatically generated.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393976#comment-14393976
 ] 

Tsuyoshi Ozawa commented on YARN-2666:
--

OK, I'll check it.

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393975#comment-14393975
 ] 

Hadoop QA commented on YARN-3435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709003/YARN-3435.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7208//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7208//console

This message is automatically generated.

> AM container to be allocated Appattempt AM container shown as null
> --
>
> Key: YARN-3435
> URL: https://issues.apache.org/jira/browse/YARN-3435
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: 1RM,1DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Trivial
> Attachments: Screenshot.png, YARN-3435.001.patch
>
>
> Submit yarn application
> Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
> Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393971#comment-14393971
 ] 

Sidharta Seethana commented on YARN-3366:
-

Since this patch requires uncommitted changes from 
https://issues.apache.org/jira/browse/YARN-3443, I am not submitting this patch 
to a pre-commit build for the time being.

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-02 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.001.patch

Attaching a patch with an implementation of traffic classification/shaping for 
traffic originating from YARN containers. This patch depends on changes/patches 
from https://issues.apache.org/jira/browse/YARN-3365 and  
https://issues.apache.org/jira/browse/YARN-3443

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3443:

Attachment: YARN-3443.001.patch

Attaching patch that 1) separates out CGroup implementation into a reusable 
class 2) creates 'PrivilegedContainerExecutor' that wraps the 
container-executor binary that can be used for operations that require elevated 
privileges 3) creates a simple ResourceHandler interface for that be used to 
plug in support for new resource types. 

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393938#comment-14393938
 ] 

Sidharta Seethana commented on YARN-2424:
-

It looks different versions of the patch to fix this were committed to branch-2 
and trunk? The corresponding changes to LinuxContainerExecutor.java look 
different. 

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: Y2424-1.patch, YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393935#comment-14393935
 ] 

Hadoop QA commented on YARN-3436:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709010/YARN-3436.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7209//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7209//console

This message is automatically generated.

> Doc WebServicesIntro.html Example Rest API url wrong
> 
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3444) Fixed typo (capability)

2015-04-02 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393861#comment-14393861
 ] 

Gabor Liptak commented on YARN-3444:


Pull request at https://github.com/apache/hadoop/pull/15

> Fixed typo (capability)
> ---
>
> Key: YARN-3444
> URL: https://issues.apache.org/jira/browse/YARN-3444
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Reporter: Gabor Liptak
>Priority: Minor
>
> Fixed typo (capability)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3444) Fixed typo (capability)

2015-04-02 Thread Gabor Liptak (JIRA)
Gabor Liptak created YARN-3444:
--

 Summary: Fixed typo (capability)
 Key: YARN-3444
 URL: https://issues.apache.org/jira/browse/YARN-3444
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Gabor Liptak
Priority: Minor


Fixed typo (capability)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-3443:
---

 Summary: Create a 'ResourceHandler' subsystem to ease addition of 
support for new resource types on the NM
 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana


The current cgroups implementation is closely tied to supporting CPU as a 
resource . We need to separate out CGroups support as well a provide a simple 
ResourceHandler subsystem that will enable us to add support for new resource 
types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393810#comment-14393810
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7501 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7501/])
YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev 
via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* hadoop-common-project/hadoop-common/src/main/conf/log4j.properties
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java


> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2901:
-
Summary: Add errors and warning metrics page to RM, NM web UI  (was: Add 
errors and warning stats to RM, NM web UI)

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393789#comment-14393789
 ] 

Sidharta Seethana commented on YARN-3365:
-

Actually, never mind - it seems like the banned user list wasn't affected.

-Sid

> Add support for using the 'tc' tool via container-executor
> --
>
> Key: YARN-3365
> URL: https://issues.apache.org/jira/browse/YARN-3365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0
>
> Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
> YARN-3365.003.patch
>
>
> We need the following functionality :
> 1) modify network interface traffic shaping rules - to be able to attach a 
> qdisc, create child classes etc
> 2) read existing rules in place 
> 3) read stats for the various classes 
> Using tc requires elevated privileges - hence this functionality is to be 
> made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:

Attachment: ConcatableAggregatedLogsProposal_v5.pdf

I've uploaded a v5 doc which address those changes.  I also clarified a few 
other things in there too.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393775#comment-14393775
 ] 

Sidharta Seethana commented on YARN-3365:
-

Thanks, Vinod! we'll need a small patch to undo the banned users change in 
branch-2.

> Add support for using the 'tc' tool via container-executor
> --
>
> Key: YARN-3365
> URL: https://issues.apache.org/jira/browse/YARN-3365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0
>
> Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
> YARN-3365.003.patch
>
>
> We need the following functionality :
> 1) modify network interface traffic shaping rules - to be able to attach a 
> qdisc, create child classes etc
> 2) read existing rules in place 
> 3) read stats for the various classes 
> Using tc requires elevated privileges - hence this functionality is to be 
> made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393773#comment-14393773
 ] 

Hudson commented on YARN-3365:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7500 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7500/])
YARN-3365. Enhanced NodeManager to support using the 'tc' tool via 
container-executor for outbound network traffic control. Contributed by 
Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java


> Add support for using the 'tc' tool via container-executor
> --
>
> Key: YARN-3365
> URL: https://issues.apache.org/jira/browse/YARN-3365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0
>
> Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
> YARN-3365.003.patch
>
>
> We need the following functionality :
> 1) modify network interface traffic shaping rules - to be able to attach a 
> qdisc, create child classes etc
> 2) read existing rules in place 
> 3) read stats for the various classes 
> Using tc requires elevated privileges - hence this functionality is to be 
> made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3365:
--
Fix Version/s: 2.8.0

> Add support for using the 'tc' tool via container-executor
> --
>
> Key: YARN-3365
> URL: https://issues.apache.org/jira/browse/YARN-3365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0
>
> Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
> YARN-3365.003.patch
>
>
> We need the following functionality :
> 1) modify network interface traffic shaping rules - to be able to attach a 
> qdisc, create child classes etc
> 2) read existing rules in place 
> 3) read stats for the various classes 
> Using tc requires elevated privileges - hence this functionality is to be 
> made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393750#comment-14393750
 ] 

Sangjin Lee commented on YARN-3051:
---

bq. To plot graphs based on timeseries data, we may need to provide a time 
window for metrics too. This would be useful in case of getEntity() API. So do 
we specify this time window separately for each metric to be retrieved or same 
for all metrics ?

My sense is that it should be fine to use the same time window for all metrics. 
[~gtCarrera9]? [~zjshen]?

bq. Queries based on relations i.e. queries such as get all containers for an 
app. We can return relatesto field while querying for an app. And then client 
can use this result to fetch detailed info about related entities. Is that fine 
? Or we have to be handle it as part of a single query ?

For now, let's assume 2 queries from the client side. My thinking was that this 
is an optimization. If the storage can return two levels of entities 
efficiently, we could potentially exploit it. But maybe that's nice to have at 
the moment.

bq. Some understanding on how flow id, flow run id will be stored is required.

Li just posted the schema design in YARN-3134. That should be helpful.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393743#comment-14393743
 ] 

zhihai xu commented on YARN-2666:
-

Hi [~ozawa], I rebased the patch YARN-2666.000.patch rebased on the latest code 
base and it passed the Jenkins test. 
Do you have time to review/commit the patch? many thanks

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2015-04-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-685.
-
Resolution: Invalid

According to test result from [~raviprak], CS fairly distributes reducers to 
NMs in the cluster. Resolving this as invalid and please reopen this if you 
still think this is a problem.


> Capacity Scheduler is not distributing the reducers tasks across the cluster
> 
>
> Key: YARN-685
> URL: https://issues.apache.org/jira/browse/YARN-685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Devaraj K
>
> If we have reducers whose total memory required to complete is less than the 
> total cluster memory, it is not assigning the reducers to all the nodes 
> uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
> running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393725#comment-14393725
 ] 

Hadoop QA commented on YARN-2666:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709083/YARN-2666.000.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7207//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7207//console

This message is automatically generated.

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393723#comment-14393723
 ] 

Sangjin Lee commented on YARN-3334:
---

I took a quick look at the latest patch, and it looks good for the most part.

However, I do worry about the size of the map produced in the response in 
ResourceTrackerService. It can be potentially quite large every time and has a 
potential impact on many things as it is part of the NM heartbeat handling. 
It's OK for now, but we should try to address it sooner than later.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393717#comment-14393717
 ] 

Robert Kanter commented on YARN-2942:
-

Yes, it does a blocking wait.  I think this will end up being in a separate 
thread anyway because it's being done after uploading the logs to HDFS.  
However, I think making it a separate service is a good idea anyway.  As you 
said, this handles NM restart, and allows us to later add more flexibility.

If you upgrade the JHS before the NM, it's not the end of the world.  New logs 
wouldn't be found by the JHS, but that only hurts users trying to view those 
logs through the JHS.  Once the JHS is updated, they would be viewable.  In any 
case, having the two configs is probably more confusing than it needs to be for 
the user, and we'd have to take care of the case where the new format is 
disabled but concatenation is enabled (which is invalid).  I think we should 
just make this one config: the new format and concatenation is enabled or 
neither is.

I'll post an updated doc shortly.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393711#comment-14393711
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
I feel like this issue and several related issues are solved by YARN-3243 
already. Could you please check if this problem is already solved?

Thanks,

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393700#comment-14393700
 ] 

Karthik Kambatla commented on YARN-2942:


(Canceled the patch to stop Jenkins from evaluating the design doc :) ) 

[~rkanter] - thanks for updating the design doc. A couple of comments:
# If there is an NM X actively concatenating its logs and NM Y can't acquire 
the lock, what happens? 
## Does it do a blocking-wait? If yes, this should likely be in a separate 
thread.
## I would like for it to be non-blocking. How about a LogConcatenationService 
in the NM? This service is brought up if you enable log concatenation. This 
service would periodically go through all of its past aggregated logs and 
concatenate those that it can acquire a lock for. Delayed concatenation should 
be okay because we are doing this primarily to handle the problem HDFS has with 
small files. Also, this way, we don't have do anything different for NM 
restart. Forward looking, this concat service could potentially take input on 
how busy HDFS is. 
# I didn't completely understand the point about a config to specify the 
format. Are you suggesting we have two different on/off configs - one to turn 
on concatenation and one to specify the format JHS should be reading. I think 
just one config that clearly states that the turning on this on an NM (writer) 
requires the JHS (reader) already has this enabled. In case of rolling 
upgrades, this translates to requiring a JHS upgrade prior to NM upgrade.  

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: YARN-3134DataSchema.pdf

After some community discussion we're finalizing the Phoenix data schema design 
for the very first phase. In this phase we focus on storing basic entities and 
their metrics, configs, and events. The attached document is a summary of our 
discussion results. Comments are more than welcome. 

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393645#comment-14393645
 ] 

Sangjin Lee commented on YARN-3391:
---

I am fine with tabling this discussion and revisiting it later in the interest 
of making progress.

I just wanted to add my 2 cents that this is something we already see and 
experience with hRaven so it's not theoretical. That's the context from our 
side. The way I see it is that apps that do not have the flow name are 
basically a degenerate case of a single-app flow. This is unrelated to the 
app-to-flow aggregation. It has to do with the flowRun-to-flow aggregation. And 
it's something we want the users to do when they can set the flow name. FWIW...

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393643#comment-14393643
 ] 

Zhijie Shen commented on YARN-3390:
---

bq. I would favor the latter approach 

+1

> RMTimelineCollector should have the context info of each app
> 
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393631#comment-14393631
 ] 

Zhijie Shen commented on YARN-3334:
---

If so, I suggest combining the two massages together, and record a error-level 
log (the first message is actually useless, if we always report the second one).

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393615#comment-14393615
 ] 

Sangjin Lee commented on YARN-3390:
---

I think we need to either pass in the context per call or have a map of app id 
to context. I would favor the latter approach because it'd be easier on the 
perspective of callers of putEntities().

> RMTimelineCollector should have the context info of each app
> 
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-04-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393587#comment-14393587
 ] 

Craig Welch commented on YARN-3293:
---

   General - it looks like the counters could possibly overflow and provide 
negative values, perhaps this is not something which could possibly happen in 
the lifetime of a cluster, but a large long-running cluster, is it a 
possiblilty/concern?
   This presently looks to be capasched only, had a suggestion to make slightly 
more general below, [~vinodkv] also mentioned "not specific to scheduler", 
perhaps it's fine to go capasched only for the first iteration, but wanted to 
verify (perhaps we need a followon jira for other schedulers).  
   
on the web page
  It's a nit, but I find I don't like the look of the / between the counter 
and the resource expression where that occurs, maybe - instead of / for those 
(allocations/reservations/releases)?
  
TestSchedulerHealth
  can we import Nodemanager & get rid of package references in code
CapacitySchedulerHealthInfo
  looks like there is no need to keep a reference to the CapacityScheduler 
instance after construction, can we drop it 
  from being a member then?
  looks like line changes in info log are just whitespace, can you drop 
them?
LeafQueue
  L884 looks to be just whitespace, can you revert?
CSAssignment
  I think that there should be a new, gsharable between schedulers class 
which incorporates all the new assignment info and that it should be a member 
of CSAssignment, instead of adding all of the  details directly to 
CSAssignment.  You would still pack the info into CSAssignment (as an instance 
of that type), but now would take a form that can be shared across schedulers

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
> apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch
>
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3437:
-
Target Version/s: YARN-2928

> convert load test driver to timeline service v.2
> 
>
> Key: YARN-3437
> URL: https://issues.apache.org/jira/browse/YARN-3437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3437.001.patch
>
>
> This subtask covers the work for converting the proposed patch for the load 
> test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393563#comment-14393563
 ] 

Wangda Tan commented on YARN-3410:
--

Thanks for your comment, [~rohithsharma].

But what's the use case of using rmadmin removing a state while RM is running? 
The command is just a way to avoid app entered an un-expected state so RM 
cannot get started, unless there's any use case of doing that, I suggest to 
scope this to a RM starting option like YARN-2131.

> YARN admin should be able to remove individual application records from 
> RMStateStore
> 
>
> Key: YARN-3410
> URL: https://issues.apache.org/jira/browse/YARN-3410
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, yarn
>Reporter: Wangda Tan
>Assignee: Rohith
>Priority: Critical
>
> When RM state store entered an unexpected state, one example is YARN-2340, 
> when an attempt is not in final state but app already completed, RM can never 
> get up unless format RMStateStore.
> I think we should support remove individual application records from 
> RMStateStore to unblock RM admin make choice of either waiting for a fix or 
> format state store.
> In addition, RM should be able to report all fatal errors (which will 
> shutdown RM) when doing app recovery, this can save admin some time to remove 
> apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393556#comment-14393556
 ] 

Hadoop QA commented on YARN-2729:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708788/YARN-2729.20150402-1.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7205//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7205//console

This message is automatically generated.

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393554#comment-14393554
 ] 

Hadoop QA commented on YARN-3437:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709078/YARN-3437.001.patch
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7206//console

This message is automatically generated.

> convert load test driver to timeline service v.2
> 
>
> Key: YARN-3437
> URL: https://issues.apache.org/jira/browse/YARN-3437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3437.001.patch
>
>
> This subtask covers the work for converting the proposed patch for the load 
> test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393545#comment-14393545
 ] 

Wangda Tan commented on YARN-2729:
--

Some comments:
*1) Configuration:*
Instead of distributed_node_labels_prefix, do you think is it better to name it 
: "yarn.node-labels.nm.provider"? The "distributed.node-labels-provider" 
doesn't clearly mentioned it runs in NM side.

I don't want to expose class to config unless it is necessary, now we have two 
options, one is script-based and another is config-based. We can set the two as 
"white-list", if a given value is not in whitelist, trying to get a class from 
the name. So the option will be: yarn.node-labels.nm.provider = 
"config/script/other-class-name".

Revisted interval, I think it's better to make it to be provider configuration 
instead of script-provider-only configuration. Since config/script will share 
it (I remember I have some back-and-forth opinions here).
If you agree above, the name could be: 
yarn.node-labels.nm.provider-fetch-interval-ms (and provider-fetch-timeout-ms)

And script-related options could be:
yarn.node-labels.nm.provider.script.path/opts

*2) Implementation of ScriptBasedNodeLabelsProvider*
I feel like ScriptBased and ConfigBased can share some implementations, they 
will all init a time task, get interval and run, check timeout (meaningless for 
config-based), etc.
Can you make an abstract class and inherited by ScriptBased?

DISABLE_TIMER_CONFIG should be a part of YarnConfiguration, all default of 
configurations should be a part of YarnConfiguration.

canRun -> something like verifyConfiguredScript, and directly throw exception 
when something wrong (so that admin can know what really happened, such as file 
not found, doesn't have execution permission, etc.), and it should be private 
non-static.

checkAndThrowLabelName should be called in NodeStatusUpdaterImpl

label need to be trim() when called checkAndThrowLabelName(...)




> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2666:

Attachment: YARN-2666.000.patch

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2666:

Attachment: (was: YARN-2666.000.patch)

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393530#comment-14393530
 ] 

Sangjin Lee commented on YARN-3437:
---

Added a few folks for review.

> convert load test driver to timeline service v.2
> 
>
> Key: YARN-3437
> URL: https://issues.apache.org/jira/browse/YARN-3437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3437.001.patch
>
>
> This subtask covers the work for converting the proposed patch for the load 
> test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3437:
--
Attachment: YARN-3437.001.patch

Patch v.1 posted.

This is basically a modification of the YARN-2556 patch (and clean-up of issues 
etc.) to work against the timeline service v.2.

Since the new distributed timeline service collectors are tied to applications, 
I chose the approach of instantiating the base timeline collector within the 
mapper task, rather than going through the timeline client. Making it go 
through the timeline client has a number of challenges (see YARN-3378). But 
this should be still effective as a way to exercise the bulk of the write 
performance and scalability.

You can try this out by doing for example

{code}
hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar
 timelineperformance -m 10 -t 1000
{code}

You'll get the output like

{noformat}
TRANSACTION RATE (per mapper): 5027.652086 ops/s
IO RATE (per mapper): 5027.652086 KB/s
TRANSACTION RATE (total): 50276.520865 ops/s
IO RATE (total): 50276.520865 KB/s
{noformat}

It is still using pretty simple entities to write to the storage. I'll work on 
adding handling job history files later in a different JIRA.

I would greatly appreciate your review. Thanks!

> convert load test driver to timeline service v.2
> 
>
> Key: YARN-3437
> URL: https://issues.apache.org/jira/browse/YARN-3437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-3437.001.patch
>
>
> This subtask covers the work for converting the proposed patch for the load 
> test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393496#comment-14393496
 ] 

Hadoop QA commented on YARN-3365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707355/YARN-3365.003.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7203//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7203//console

This message is automatically generated.

> Add support for using the 'tc' tool via container-executor
> --
>
> Key: YARN-3365
> URL: https://issues.apache.org/jira/browse/YARN-3365
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
> YARN-3365.003.patch
>
>
> We need the following functionality :
> 1) modify network interface traffic shaping rules - to be able to attach a 
> qdisc, create child classes etc
> 2) read existing rules in place 
> 3) read stats for the various classes 
> Using tc requires elevated privileges - hence this functionality is to be 
> made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393494#comment-14393494
 ] 

Wangda Tan commented on YARN-2901:
--

+1 for the patch. Will commit it today if no opposite opinions.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393493#comment-14393493
 ] 

Hadoop QA commented on YARN-3388:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709050/YARN-3388-v1.patch
  against trunk revision eccb7d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7201//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7201//console

This message is automatically generated.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393476#comment-14393476
 ] 

zhihai xu commented on YARN-3415:
-

Thanks [~ragarwal] for valuable feedback and filing this issue. Thanks 
[~sandyr]  for valuable feedback and committing the patch! Greatly appreciated.

> Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
> queue
> --
>
> Key: YARN-3415
> URL: https://issues.apache.org/jira/browse/YARN-3415
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
> YARN-3415.002.patch
>
>
> We encountered this problem while running a spark cluster. The 
> amResourceUsage for a queue became artificially high and then the cluster got 
> deadlocked because the maxAMShare constrain kicked in and no new AM got 
> admitted to the cluster.
> I have described the problem in detail here: 
> https://github.com/apache/spark/pull/5233#issuecomment-87160289
> In summary - the condition for adding the container's memory towards 
> amResourceUsage is fragile. It depends on the number of live containers 
> belonging to the app. We saw that the spark AM went down without explicitly 
> releasing its requested containers and then one of those containers memory 
> was counted towards amResource.
> cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393450#comment-14393450
 ] 

Hadoop QA commented on YARN-2942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709065/ConcatableAggregatedLogsProposal_v4.pdf
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7204//console

This message is automatically generated.

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393451#comment-14393451
 ] 

Wangda Tan commented on YARN-2729:
--

Apparently Jenkins ran wrong tests, rekicked Jenkins.

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393445#comment-14393445
 ] 

Hitesh Shah commented on YARN-2890:
---

Sorry did not check the last update. 

Minor nit: Some of the test changes in TestMRTimelineEventHandling probably 
need to belong in TestMiniYarnCluster if that exists as yarn timeline flag 
behaviour checks should ideally be tested in yarn code and not MR code. 





> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393441#comment-14393441
 ] 

Junping Du commented on YARN-3334:
--

Thanks [~zjshen] for review and comments!
bq. but I undo the some unnecessary change in TimelineClientImpl (which seems 
to be adde for code debugging).
I think that is necessary change. Previous message cannot tell too much info 
especially it return no different message between no response and response with 
failure. Also, error code should be log out even debug is not on because this 
is serious failure and should be reported in production environment. Thoughts?

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393440#comment-14393440
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

Filed YARN-3441 and YARN-3442 for parent queues and for limits.


> Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
> LeafQueue supporting present behavior
> ---
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch
>
>
> Create the initial framework required for using OrderingPolicies with 
> SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
> will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3442) Consider abstracting out user, app limits etc into some sort of a LimitPolicy

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-3442:
-

 Summary: Consider abstracting out user, app limits etc into some 
sort of a LimitPolicy
 Key: YARN-3442
 URL: https://issues.apache.org/jira/browse/YARN-3442
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Similar to the policies being added in YARN-3318 and YARN-3441 for leaf and 
parent queues, we should consider extracting an abstraction for limits too.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:

Attachment: ConcatableAggregatedLogsProposal_v4.pdf

I've just uploaded ConcatableAggregatedLogsProposal_v4.pdf, with an updated 
design that uses a slightly modified version of the CombinedAggregatedLogFormat 
(now ConcatableAggregatedLogFormat) I already wrote and would use HDFS concat 
to combine the files.

[~zjshen], [~kasha], and [~vinodkv], can you take a look at it?

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393433#comment-14393433
 ] 

Rohini Palaniswamy commented on YARN-3439:
--

bq. Essentially the idea is to reference count the tokens and only attempt to 
cancel them when the token is no longer referenced. 
   Would be a good idea. I think this is the third time we have had delegation 
token renewal broken for Oozie with the Hadoop 2.x line. 

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3441) Introduce the notion of policies for a parent queue

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-3441:
-

 Summary: Introduce the notion of policies for a parent queue
 Key: YARN-3441
 URL: https://issues.apache.org/jira/browse/YARN-3441
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Similar to the policy being added in YARN-3318 for leaf-queues, we need to 
extend this notion to parent-queue too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393410#comment-14393410
 ] 

Hadoop QA commented on YARN-3439:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709044/YARN-3439.001.patch
  against trunk revision eccb7d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7200//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7200//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7200//console

This message is automatically generated.

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393412#comment-14393412
 ] 

Hudson commented on YARN-3415:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7497/])
YARN-3415. Non-AM containers can be counted towards amResourceUsage of a 
fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 
6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


> Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
> queue
> --
>
> Key: YARN-3415
> URL: https://issues.apache.org/jira/browse/YARN-3415
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
> YARN-3415.002.patch
>
>
> We encountered this problem while running a spark cluster. The 
> amResourceUsage for a queue became artificially high and then the cluster got 
> deadlocked because the maxAMShare constrain kicked in and no new AM got 
> admitted to the cluster.
> I have described the problem in detail here: 
> https://github.com/apache/spark/pull/5233#issuecomment-87160289
> In summary - the condition for adding the container's memory towards 
> amResourceUsage is fragile. It depends on the number of live containers 
> belonging to the app. We saw that the spark AM went down without explicitly 
> releasing its requested containers and then one of those containers memory 
> was counted towards amResource.
> cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-02 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393408#comment-14393408
 ] 

Mit Desai commented on YARN-2890:
-

[~hitesh], did you had any comments on the patch?

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393388#comment-14393388
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

bq.  I think it should be fine to make policy interfaces define as well as 
CapacityScheduler changes together with this patch (only for 
FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, 
is it easy or not, etc. =
We can still do this with patches on two JIRAs - one for the framework, one for 
CS, one for FS etc. The Fifo one can be here for demonstration, no problem with 
that. Why is it so hard to focus one thing in one JIRA?

> Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
> LeafQueue supporting present behavior
> ---
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch
>
>
> Create the initial framework required for using OrderingPolicies with 
> SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
> will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-3415:
-
Summary: Non-AM containers can be counted towards amResourceUsage of a Fair 
Scheduler queue  (was: Non-AM containers can be counted towards amResourceUsage 
of a fairscheduler queue)

> Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
> queue
> --
>
> Key: YARN-3415
> URL: https://issues.apache.org/jira/browse/YARN-3415
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
> YARN-3415.002.patch
>
>
> We encountered this problem while running a spark cluster. The 
> amResourceUsage for a queue became artificially high and then the cluster got 
> deadlocked because the maxAMShare constrain kicked in and no new AM got 
> admitted to the cluster.
> I have described the problem in detail here: 
> https://github.com/apache/spark/pull/5233#issuecomment-87160289
> In summary - the condition for adding the container's memory towards 
> amResourceUsage is fragile. It depends on the number of live containers 
> belonging to the app. We saw that the spark AM went down without explicitly 
> releasing its requested containers and then one of those containers memory 
> was counted towards amResource.
> cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393374#comment-14393374
 ] 

Wangda Tan commented on YARN-3318:
--

[~cwelch],
I took a look at your latest patch as well as [~vinodkv]'s suggestions, 
comments:

*1. I prefer what Vinod suggested, split "SchedulerProcess" to be 
"QueueSchedulable" and "AppSchedulable" to avoid notes in FairScheduler 
interface "Schedulable" like:*
{code}
/** Start time for jobs in FIFO queues; meaningless for QueueSchedulables.*/
{code}
They can both inherit {{Schedulable}}. With this patch, we can limit to 
AppSchedulable and Schedulable definition.
Also, regarding to schedulable comparator, not all "Schedulable" fit for all 
comparator, it's meaningless to do "FIFO" scheduling in parent queue level.
I think:
{code}
Schedulable contains ResourceUsage (class), and name
In addition, AppSchedulable contains compareSubmissionOrderTo(AppSchedulable) 
and Priority
{code}

*2. About inherit relationships between interfaces/classes, now it's not very 
clear to me, I spent some time got what they're doing. My suggestion is:*
{code}
FairOrderingPolicy/FifoOrderingPolicy > OrderingPolicy
  (implements)
FairOrderingPolicy and FifoOrderingPolicy could inherit from 
AbstractOrderingPolicy with commmon implementations

FairOrderingPolicy/FifoOrderingPolicy > 
FairSchedulableComparator/FifoSchedulableComparator
(uses)
It's no need to invent "SchedulerComparator" interface, use existing Java 
Comparator interface should be simple and enough.
{code}

*3. Regarding relationship between OrderingPolicy and comparator:*
I understand the method of SchedulerComparator is to reduce unnecessary re-sort 
Schedulables being added/modified in OrderingPolicy, but actually we can 
1) Do this in OrderingPolicy itself. For example, with my above suggestion, 
FifoOrderingPolicy will simply ignore container changed notifications.
2) Comparator doesn't know about global info, only OrderingPolicy knows how 
combination of Comparator actors, I don't want containerAllocate/Release 
coupled in Comparator interface.
And we don't need a separated "CompoundComparator", this can be put in 
AbstractOrderingPolicy.

*4. Regarding configuration (CapacitySchedulerConfiguration):*
I think we don't need ORDERING_POLICY_CLASS, two operations for very similar 
purpose can confuse user. I suggest only leave ordering-policy, and it name can 
be:
"fifo", "fair" regardless of its internal "comparator" implementaiton. And in 
the future we can add "priority-fifo", "priority-fair". (note the "-" in name 
doesn't means "AND" only, it could be collaborate of the two instead of simply 
combination).
If user specify a name not in white-list-shortname given by us, we will try to 
load class with the name.

*5. Regarding longer term plan, LimitPolicy:*
This part seems not well discussed, to limit scope of this JIRA, so I think its 
implementation and definition should happen in separated ticket.
For longer plan, considering YARN-2986 as well, we may configure queue like 
following:
{code}




fair


  true
  50


   ..
   ..






{code}
Changes of this patch in CapacitySchedulerConfiguration seems reasonable, as 
Craig mentioned, simply mark it to be unstable or experimental should be 
enough. Longer term is to define and stablize YARN-2986 to make a real 
uniformed scheduler.

*6. Regarding scope of this JIRA*
I think it should be fine to make policy interfaces define as well as 
CapacityScheduler changes together with this patch (only for 
FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, 
is it easy or not, etc. =
And following I suggest to move to a separated ticket:
1) UI (Web and CLI)
2) REST
3) PB related changes
Along with patch getting changes, you don't have to maintain above changes 
together with the patch.

Please feel free to let me know your thoughts.

> Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
> LeafQueue supporting present behavior
> ---
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch
>
>

[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3334:
--
Attachment: YARN-3334.7.patch

Last patch looks good to me, but I undo the some unnecessary change in 
TimelineClinetImpl (which seems to be adde for code debugging). Will hold the 
patch for a while before committing, in case other folks want to to take a look.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393347#comment-14393347
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

bq. I think it is useful to split off CS changes into their own JIRA. We can 
strictly focus on the policy framework here.
You missed this, let's please do this.

bq. well, I'd actually talked Wangda Tan into SchedulerProcess So, we can chew 
on this a bit more & see where we go
SchedulerProcess is definitely misleading. It seems to point to a process that 
is doing scheduling. What you need is a Schedulable  / SchedulableEntity / 
Consumer etc. You could also say SchedulableProcess, but Process is way too 
overloaded.

bq.  The goal is to make this available quickly but iteratively, keeping the 
changes small but making them available for use and feedback. (..) We should 
grow it organically, gradually, iteratively, think of it is a facet of the 
policy framework hooked up and available but with more to follow
I agree to this, but we are not in a position to support the APIs, CLI, config 
names in a supportable manner yet. They may or may not change depending on how 
parent queue policies, limit policies evolve. For that reason alone, I am 
saying that (1) Don't make the configurations public yet, or put a warning 
saying that they are unstable and (2) don't expose them in CLI , REST APIs yet. 
It's okay to put in the web UI, web UI scraping is not a contract.

bq. You add/remove applications to/from LeafQueue's policy but 
addition/removal of containers is an event...
bq. This has been factored differently along Wangda Tan's suggestion, it should 
now be consistent
It's a bit better now. Although we are hard-coding Containers. Can revisit this 
later.

Other comments
 - SchedulerApplicationAttempt.getDemand() should be private.
 - SchedulerProcess
-- updateCaches() -> updateState() / updateSchedulingState() as that is 
what it is doing?
-- getCachedConsumption() / getCachedDemand(): simply getCurrent*() ?
 - SchedulerComparator
  -- We aren't comparing Schedulers. Given the current name, it should have 
been SchedulerProcessComparator, but SchedulerProcess itself should be renamed 
as mentioned before.
  -- What is the need for reorderOnContainerAllocate () / 
reorderOnContainerRelease()?
 - Move all the comparator related classed into their own package.
 - SchedulerComparatorPolicy
  -- This is really a ComparatorBasedOrderingPolicy. Do we really see 
non-comparator based ordering-policy. We are unnecessarily adding two 
abstractions - adding policies and comparators.
  -- Use className.getName() instead of hardcoded strings like 
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.FifoComparator"

> Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
> LeafQueue supporting present behavior
> ---
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch
>
>
> Create the initial framework required for using OrderingPolicies with 
> SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
> will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393351#comment-14393351
 ] 

zhihai xu commented on YARN-3415:
-

[~sandyr], thanks for the review, The latest patch YARN-3415.002.patch is 
rebased on the latest code base and it passed the Jenkins test. Let me know 
whether you have more comments for the patch.

> Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
> queue
> -
>
> Key: YARN-3415
> URL: https://issues.apache.org/jira/browse/YARN-3415
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
> YARN-3415.002.patch
>
>
> We encountered this problem while running a spark cluster. The 
> amResourceUsage for a queue became artificially high and then the cluster got 
> deadlocked because the maxAMShare constrain kicked in and no new AM got 
> admitted to the cluster.
> I have described the problem in detail here: 
> https://github.com/apache/spark/pull/5233#issuecomment-87160289
> In summary - the condition for adding the container's memory towards 
> amResourceUsage is fragile. It depends on the number of live containers 
> belonging to the app. We saw that the spark AM went down without explicitly 
> releasing its requested containers and then one of those containers memory 
> was counted towards amResource.
> cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-04-02 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v1.patch

Hi [~leftnoteasy]. Uploaded a new version of patch that addresses the 
inefficiency and adds unit tests.

I think label support is better left for a separate jira when labels are fully 
working with userlimits. 

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393264#comment-14393264
 ] 

Zhijie Shen commented on YARN-3391:
---

[~vrushalic], it sounds good to me to set aside the disagreement on the flow 
name default and move on. As far as I can tell, with the current context info 
data flow, it's quite simple to change the default value if we figure out the 
better one later. In addition, the previous debate is also related how we show 
flows on the web UI by default. I think we can go back to visit the defaults 
once we reaches the web UI work when we should have a better idea about it.

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3440) ResourceUsage should be copy-on-write

2015-04-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3440:

Description: 
In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, 
even if it is thread-safe, but Resource returned by getters could be updated by 
another thread.

All Resource objects in ResourceUsage should be copy-on-write, reader will 
always get a non-changed Resource. And changes apply on Resource acquired by 
caller will not affect original Resource.

  was:
In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage
}}, even if it is thread-safe, but Resource returned by getters could be 
updated by another thread.

All Resource objects in ResourceUsage should be copy-on-write, reader will 
always get a non-changed Resource. And changes apply on Resource acquired by 
caller will not affect original Resource.


> ResourceUsage should be copy-on-write
> -
>
> Key: YARN-3440
> URL: https://issues.apache.org/jira/browse/YARN-3440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler, yarn
>Reporter: Wangda Tan
>Assignee: Li Lu
>
> In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, 
> even if it is thread-safe, but Resource returned by getters could be updated 
> by another thread.
> All Resource objects in ResourceUsage should be copy-on-write, reader will 
> always get a non-changed Resource. And changes apply on Resource acquired by 
> caller will not affect original Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3439:
-
Attachment: YARN-3439.001.patch

Daryn is out so posting a prototype patch he developed to get some early 
feedback.  Note that this patch can't go in as-is, as it's a work-in-progress 
that hacks out the automatic HDFS delegation token logic that was added as part 
of YARN-2704.

Essentially the idea is to reference count the tokens and only attempt to 
cancel them when the token is no longer referenced.  Since the launcher job 
won't complete until it has successfully submitted the sub-job(s), the token 
will remain referenced throughout the lifespan of the workflow even if the 
launcher job exits early.

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-04-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393194#comment-14393194
 ] 

Craig Welch commented on YARN-3293:
---

Hey [~vvasudev], it seems that the patch doesn't apply cleanly, can you update 
to latest trunk?

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
> apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch
>
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v6.patch

Incorporate [~zjshen]'s comments in v6 patch. Rebase it to latest YARN-2928 and 
verified e2e test can pass. [~zjshen], can you look it again? Thanks!

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3440) ResourceUsage should be copy-on-write

2015-04-02 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3440:


 Summary: ResourceUsage should be copy-on-write
 Key: YARN-3440
 URL: https://issues.apache.org/jira/browse/YARN-3440
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler, yarn
Reporter: Wangda Tan


In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage
}}, even if it is thread-safe, but Resource returned by getters could be 
updated by another thread.

All Resource objects in ResourceUsage should be copy-on-write, reader will 
always get a non-changed Resource. And changes apply on Resource acquired by 
caller will not affect original Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-02 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393150#comment-14393150
 ] 

Vrushali C commented on YARN-3391:
--

Hi [~zjshen]

In the interest of time, I think let's park these disagreements aside and move 
forward with your defaults. If the need arises, we could revisit defaults in 
the future. What do you all think? cc [~sjlee0]  

thanks
Vrushali

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393148#comment-14393148
 ] 

Hadoop QA commented on YARN-2901:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709021/apache-yarn-2901.5.patch
  against trunk revision 9ed43f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7199//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7199//console

This message is automatically generated.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3374) Collector's web server should randomly bind an available port

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3374.
--
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Commit it to branch YARN-2928. Thanks [~zjshen] for the patch!

> Collector's web server should randomly bind an available port
> -
>
> Key: YARN-3374
> URL: https://issues.apache.org/jira/browse/YARN-3374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: YARN-2928
>
> Attachments: YARN-3347.1.patch
>
>
> It's based on the configuration now. The approach won't work if we move to 
> app-level aggregator container solution. On NM my start multiple such 
> aggregators, which cannot bind to the same configured port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3440) ResourceUsage should be copy-on-write

2015-04-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-3440:
---

Assignee: Li Lu

> ResourceUsage should be copy-on-write
> -
>
> Key: YARN-3440
> URL: https://issues.apache.org/jira/browse/YARN-3440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler, yarn
>Reporter: Wangda Tan
>Assignee: Li Lu
>
> In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage
> }}, even if it is thread-safe, but Resource returned by getters could be 
> updated by another thread.
> All Resource objects in ResourceUsage should be copy-on-write, reader will 
> always get a non-changed Resource. And changes apply on Resource acquired by 
> caller will not affect original Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393139#comment-14393139
 ] 

Zhijie Shen commented on YARN-3390:
---

I think it makes sense to generalize TimelineEntityContext from a single app's 
context to the app -> context map. Reader may need this map too. I'll fix the 
problem after YARN-3391 is done.

> RMTimelineCollector should have the context info of each app
> 
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393116#comment-14393116
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2101 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2101/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt


> RMAppAttempt headroom data is missing in RM Web UI
> --
>
> Key: YARN-3430
> URL: https://issues.apache.org/jira/browse/YARN-3430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3430.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393122#comment-14393122
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2101 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2101/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


> NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
> failed
> --
>
> Key: YARN-3425
> URL: https://issues.apache.org/jira/browse/YARN-3425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: 1 RM, 1 NM , 1 NN , I DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3425.001.patch
>
>
> Configure yarn.node-labels.enabled to true 
> and yarn.node-labels.fs-store.root-dir /node-labels
> Start resource manager without starting DN/NM
> {quote}
> 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
> {quote}
> {code}
>  protected void stopDispatcher() {
> AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
>asyncDispatcher.stop(); 
>   }
> {code}
> Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393105#comment-14393105
 ] 

Jason Lowe commented on YARN-3439:
--

This appears to be caused by YARN-2704.  YARN-2964 tried to fix it but assumed 
that Oozie launcher jobs will always run for the duration of the sub-jobs.  
This is true when the launcher runs a Pig job, but apparently is not true when 
it runs a standard MapReduce job.

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3439:


 Summary: RM fails to renew token when Oozie launcher leaves before 
sub-job finishes
 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker


When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
linger waiting for the sub-job to finish.  At that point the RM stops renewing 
delegation tokens for the launcher job which wreaks havoc on the sub-job if the 
sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393095#comment-14393095
 ] 

Naganarasimha G R commented on YARN-3390:
-

Hi [~zjshen], [~djp] & [~sjlee0]
{{TimelineCollector.getTimelineEntityContext()}} interface will not be suitable 
for the RMTimelineCollector as it will be posting/putting entities for multiple 
apps, appattempts and containers. Hence was initially planning to modify this 
method to take a {{TimelineEntity.Identifier}} as a parameter and @ 
RMTimelineCollector planning to hold a  map of {{TimelineEntity.Identifier to 
AppId}} and another Map to hold {{AppId to TimelineEntityContext}} (required as 
context is created per app when appCreatedEvent occurs). 
But one more conflict which i can see is AppLevelTimelineCollector is specific 
for a app, so invoking {{getTimelineEntityContext}} in 
{{getTimelineEntityContext(TimelineEntities ,Ugi)}} is fine because all the 
entities which are posted can be assumed to have same context as they belong to 
a single app but in a general case (like RMTimelineCollector) its not 
guaranteed that all TimelineEntities belong to same app(i.e. TimelineEntities 
might have diff context). so would it be better to change the interface of 
{{TimelineCollector.putEntities)}} to accept the {{TimelineEntityContext}} as 
parameter and remove {{TimelineCollector.getTimelineEntityContext()}} method 
from interface ? please share your opinion...

> RMTimelineCollector should have the context info of each app
> 
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393089#comment-14393089
 ] 

Junping Du commented on YARN-3046:
--

Also, looks like no YARN changes get involved in this JIRA, will migrate it to 
MAPREDUCE project later.

> [Event producers] Implement MapReduce AM writing some MR metrics to ATS
> ---
>
> Key: YARN-3046
> URL: https://issues.apache.org/jira/browse/YARN-3046
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: YARN-3046-no-test.patch
>
>
> Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
> written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3046:
-
Attachment: YARN-3046-no-test.patch

Upload the first patch - not including any test yet. Call for a early review.
Will add tests soon.

> [Event producers] Implement MapReduce AM writing some MR metrics to ATS
> ---
>
> Key: YARN-3046
> URL: https://issues.apache.org/jira/browse/YARN-3046
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: YARN-3046-no-test.patch
>
>
> Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
> written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-04-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2901:

Attachment: apache-yarn-2901.5.patch

{quote}
I realized if we set clean-up-threshold > maxUniqueMessages, user can see it, 
how about doing clean-up in two conditions:
1) User get message, and #message > maxUniqueMessages
2) #messages > message-threshold, we can set the message-threshold to higher to 
avoid too frequent cleanup.
Sounds good?
{quote}

Makes sense; made the change.

bq. I just tried to move that, it seems no more issues happen, could you check 
that?

Moved ErrorAndWarningsBlock to hadoop-yarn-server-common. Renamed 
ErrorsAndWarningsPage in RM and NM to RMErrorsAndWarningsPage and 
NMErrorsAndWarningsPage.

> Add errors and warning stats to RM, NM web UI
> -
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393015#comment-14393015
 ] 

Zhijie Shen commented on YARN-3334:
---

Junping, did you have the chance to look at the 3 and 4 of my last patch 
comment? One more nit: newTimelineServiceEnabled(config) -> 
systemMetricsPublisherEnabled?

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3438) add a mode to replay MR job history files to the timeline service

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3438:
-

 Summary: add a mode to replay MR job history files to the timeline 
service
 Key: YARN-3438
 URL: https://issues.apache.org/jira/browse/YARN-3438
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


The subtask covers the work on top of YARN-3437 to add a mode to replay MR job 
history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3437:
-

 Summary: convert load test driver to timeline service v.2
 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


This subtask covers the work for converting the proposed patch for the load 
test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3378) a load test client that can replay a volume of history files

2015-04-02 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3378:
--
Issue Type: New Feature  (was: Sub-task)
Parent: (was: YARN-2928)

> a load test client that can replay a volume of history files
> 
>
> Key: YARN-3378
> URL: https://issues.apache.org/jira/browse/YARN-3378
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> It might be good to create a load test client that can replay a large volume 
> of history files into the timeline service. One can envision running such a 
> load test client as a mapreduce job and generate a fair amount of load. It 
> would be useful to spot check correctness, and more importantly observe 
> performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong

2015-04-02 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3436:
---
Attachment: YARN-3436.001.patch

Attaching patch for the same. please check the same

> Doc WebServicesIntro.html Example Rest API url wrong
> 
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong

2015-04-02 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3436:
--

 Summary: Doc WebServicesIntro.html Example Rest API url wrong
 Key: YARN-3436
 URL: https://issues.apache.org/jira/browse/YARN-3436
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


/docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html

{quote}

Response Examples
JSON response with single resource

HTTP Request: GET 
http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001

Response Status Line: HTTP/1.1 200 OK

{quote}

Url should be ws/v1/cluster/{color:red}apps{color} .
2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3435:
---
Attachment: Screenshot.png

Attaching Screen shot for bug

> AM container to be allocated Appattempt AM container shown as null
> --
>
> Key: YARN-3435
> URL: https://issues.apache.org/jira/browse/YARN-3435
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: 1RM,1DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Trivial
> Attachments: Screenshot.png
>
>
> Submit yarn application
> Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
> Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3435:
---
Attachment: YARN-3435.001.patch

> AM container to be allocated Appattempt AM container shown as null
> --
>
> Key: YARN-3435
> URL: https://issues.apache.org/jira/browse/YARN-3435
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: 1RM,1DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Trivial
> Attachments: Screenshot.png, YARN-3435.001.patch
>
>
> Submit yarn application
> Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
> Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3435:
--

 Summary: AM container to be allocated Appattempt AM container 
shown as null
 Key: YARN-3435
 URL: https://issues.apache.org/jira/browse/YARN-3435
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: 1RM,1DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial


Submit yarn application
Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392872#comment-14392872
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


> NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
> failed
> --
>
> Key: YARN-3425
> URL: https://issues.apache.org/jira/browse/YARN-3425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: 1 RM, 1 NM , 1 NN , I DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3425.001.patch
>
>
> Configure yarn.node-labels.enabled to true 
> and yarn.node-labels.fs-store.root-dir /node-labels
> Start resource manager without starting DN/NM
> {quote}
> 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
> {quote}
> {code}
>  protected void stopDispatcher() {
> AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
>asyncDispatcher.stop(); 
>   }
> {code}
> Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392866#comment-14392866
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java


> RMAppAttempt headroom data is missing in RM Web UI
> --
>
> Key: YARN-3430
> URL: https://issues.apache.org/jira/browse/YARN-3430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3430.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392784#comment-14392784
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/142/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java


> RMAppAttempt headroom data is missing in RM Web UI
> --
>
> Key: YARN-3430
> URL: https://issues.apache.org/jira/browse/YARN-3430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3430.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >