[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060752#comment-15060752
 ] 

Varun Saxena commented on YARN-4238:


Quick update based on the discussion we had in call.

We decided to 

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2015-12-16 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060751#comment-15060751
 ] 

Daniel Templeton commented on YARN-3102:


Sounds like a generally reasonable approach.  I definitely agree that the 
exclude list will need to be read on the host files refresh.

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060792#comment-15060792
 ] 

Varun Saxena commented on YARN-4238:


Quick update based on the discussion we had in call.

We decided to hold off committing this JIRA for a while.  Based on today's 
discussion scope of this JIRA will change a bit and will have to change the 
title accordingly  as well.
Coming to the discussion,
# Created time would still be updated from the client.
# We discussed what modified time actually means and what it has to be used 
for. And should we rely on client filling it or server should fill it. It came 
across that modified time was kept as part of the entity for mainly debugging 
purposes. And it makes more sense from a debugging perspective to know when was 
a particular Put was made into HBase then the value reported by client.
# As per discussion till now, modified time can although be filled by the 
client, it will be ignored by the server. And HBase cell timestamp will be used 
to fill the modified time in the entity response. As part of this proposal, we 
will drop the modified time column from the entity and application table. And 
will fill the modified time based on the cells returned(in terms of entity, 
based on fields returned). 
One point to ponder upon here is though that on reader side we filter rows 
based on modified time. Currently its done after fetching records from HBase. 
If we get modified time like this, the filtering based on it will again be on 
the basis of fields to return rather than overall entity's modified time.

cc [~sjlee0], [~jrottinghuis], [~vrushalic]. Anything more you want to add ?

Naga, you can discuss with me offline for details.

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-16 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase reassigned YARN-4257:


Assignee: Rich Haase

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060726#comment-15060726
 ] 

Hudson commented on YARN-4207:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8977 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8977/])
YARN-4207. Add a non-judgemental YARN app completion status. Contributed 
(sseth: rev 0f708d465fbc4260f2c36e8067e27cd8b285fde7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/FinalApplicationStatus.java
* hadoop-yarn-project/CHANGES.txt


> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Rich Haase
>  Labels: trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4420) Add REST API for List Reservations

2015-12-16 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4420:
--
Attachment: YARN-4420.v1.patch

First version of the patch. This is dependent on YARN-4340, so I have not made 
the patch available. 

> Add REST API for List Reservations
> --
>
> Key: YARN-4420
> URL: https://issues.apache.org/jira/browse/YARN-4420
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, fairscheduler, resourcemanager
>Reporter: Sean Po
>Assignee: Sean Po
>Priority: Minor
> Attachments: YARN-4420.v1.patch
>
>
> This JIRA tracks changes to the REST APIs of the reservation system and 
> enables querying the reservation on which reservations exists by "time-range, 
> and reservation-id". 
> This task has a dependency on YARN-4340.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4466) ResourceManager should tolerate unexpected exceptions to happen in non-critical subsystem/services like SystemMetricsPublisher

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060311#comment-15060311
 ] 

Varun Saxena commented on YARN-4466:


Assigned to Naga

> ResourceManager should tolerate unexpected exceptions to happen in 
> non-critical subsystem/services like SystemMetricsPublisher
> --
>
> Key: YARN-4466
> URL: https://issues.apache.org/jira/browse/YARN-4466
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>
> From my comment in 
> YARN-4452(https://issues.apache.org/jira/browse/YARN-4452?focusedCommentId=15059805=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15059805),
>  we should make RM more robust with ignore (but log) unexpected exception in 
> its non-critical subsystems/services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4466) ResourceManager should tolerate unexpected exceptions to happen in non-critical subsystem/services like SystemMetricsPublisher

2015-12-16 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4466:
---
Assignee: Naganarasimha G R  (was: Varun Saxena)

> ResourceManager should tolerate unexpected exceptions to happen in 
> non-critical subsystem/services like SystemMetricsPublisher
> --
>
> Key: YARN-4466
> URL: https://issues.apache.org/jira/browse/YARN-4466
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>
> From my comment in 
> YARN-4452(https://issues.apache.org/jira/browse/YARN-4452?focusedCommentId=15059805=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15059805),
>  we should make RM more robust with ignore (but log) unexpected exception in 
> its non-critical subsystems/services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-16 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3586:
-
Attachment: YARN-3586-feature-YARN-2928.patch

Update a patch with proper unit test.

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4423:
---
Attachment: YARN-4423.003.patch

Rebased

> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-16 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3586:
-
Priority: Critical  (was: Major)

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060757#comment-15060757
 ] 

Varun Saxena commented on YARN-4238:


Sorry...Submitted comment by mistake.

> createdTime and modifiedTime is not reported while publishing entities to 
> ATSv2
> ---
>
> Key: YARN-4238
> URL: https://issues.apache.org/jira/browse/YARN-4238
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4238-YARN-2928.01.patch, 
> YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, 
> YARN-4238-feature-YARN-2928.02.patch
>
>
> While publishing entities from RM and elsewhere we are not sending created 
> time. For instance, created time in TimelineServiceV2Publisher class and for 
> other entities in other such similar classes is not updated. We can easily 
> update created time when sending application created event. Likewise for 
> modification time on every write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-16 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060808#comment-15060808
 ] 

Rich Haase commented on YARN-4257:
--

Makes sense to me.

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-12-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060611#comment-15060611
 ] 

Siddharth Seth commented on YARN-4207:
--

+1. This looks good. Thanks [~rhaase]

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Rich Haase
>  Labels: trivial
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4293:
-
Description: In order to get resource utilization information easier, "yarn 
node" CLI should include resource utilization on the node.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch, 
> 0003-YARN-4293.patch
>
>
> In order to get resource utilization information easier, "yarn node" CLI 
> should include resource utilization on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060794#comment-15060794
 ] 

Hadoop QA commented on YARN-3586:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
35s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
30s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 1s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 26s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 145m 59s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:7c86163 |
| JIRA Patch 

[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-12-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060881#comment-15060881
 ] 

Wangda Tan commented on YARN-4108:
--

Thanks [~curino] for comments!

I think all your suggestions/concerns are make sense to me.

I'm thinking if it is possible to find a way to combine the two approach 
(normal and surgical) together which is suggested by [~chris.douglas]. I feel 
combination of the two approach could solve all your concerns. Will keep you 
updated.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-v1.pdf, 
> YARN-4108-design-doc-v2.pdf, YARN-4108.poc.1.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Enhance filters in TimelineReader

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060884#comment-15060884
 ] 

Varun Saxena commented on YARN-3863:


We can add a filter for created time too. Had missed it.

> Enhance filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060437#comment-15060437
 ] 

Hadoop QA commented on YARN-4423:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} 
| {color:red} YARN-4423 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778061/YARN-4423.003.patch |
| JIRA Issue | YARN-4423 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10007/console |


This message was automatically generated.



> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4423:
---
Attachment: YARN-4423.003.patch

> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4333) Fair scheduler should support preemption within queue

2015-12-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060461#comment-15060461
 ] 

Wangda Tan commented on YARN-4333:
--

[~Tao Jie], I've added you to contributor list and assigned the JIRA to you.

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4333.001.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-12-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060543#comment-15060543
 ] 

Carlo Curino commented on YARN-4108:


[~leftnoteasy] thanks for explaining us what you are working on. All in all, it 
sounds like a reasonable enhancement. The two things I think we should keep an 
eye out for is that:
 # The "dry-run" mechanisms you describe, while clever is increasing the cost 
of processing NM heart-beats. We should very carefully evaluate this, to make 
sure this approach can scale to large/busy clusters. It would be great to have 
some self-tuning (or manual tuning) mechanics that allow us to leverage this 
only if manageable for scale.
 # Your proposed approach clashes a bit with the "non-strict" version of 
preemption, where an AM is allowed to return an equivalent amount of resources 
somewhere else. While I am not sure this is happening in the wild yet (AM-side 
support of preemption is still minimal/absent), I think it is important 
especially as we move towards richer applications.
 # The proposed approach prevents certain preemption actions that will never 
lead to a usable containers, however since preemption has anyway a long-lag 
(wait-before-kill + actual kill + dispatch of new container to AMs). It is 
possible that the demand you decided to preempt for will be satisfied by the 
time the resources from preempted containers are offered (hence some unneeded 
preemption might remain). Again, this is more likely in large cluster where 
much more is happening at any one time. 

Bottomline, I like the general direction, and I can see scenarios (services or 
small clusters) where this can improve things a fair bit, but we should make 
sure this works well in large/busy clusters running mostly batch jobs.
 

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-v1.pdf, 
> YARN-4108-design-doc-v2.pdf, YARN-4108.poc.1.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0005-YARN-4304.patch

Uploading a new version of patch as YARN-4418 is committed.
Also addressed the comments given earlier.

{noformat}
_("Max Application Master Resources Per User:",
  resourceUsages.getAMResourceLimit().toString());
{noformat}

For User AM Limit ,we do not have any place holder now except {{userInfo}}. 
Hence I have given AM resource limit here which is not very correct. I can 
invoke few apis here, and some how try to get the corresponding userInfo 
object, pls suggest if the same is needed.
[~leftnoteasy], pls help to check the same

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060574#comment-15060574
 ] 

Hudson commented on YARN-4452:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8975 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8975/])
YARN-4452. NPE when submit Unmanaged application. Contributed by (junping_du: 
rev 50bd067e1d63d4c80dc1e7bf4024bfaf42cf4416)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java


> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4452.v1.001.patch, YARN-4452.v1.002.patch
>
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060458#comment-15060458
 ] 

Daniel Templeton commented on YARN-4423:


The patch doesn't apply to trunk because it now depends on YARN-4457.  Once 
that's in, I'll run the build again.

> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4333) Fair scheduler should support preemption within queue

2015-12-16 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4333:
-
Assignee: Tao Jie

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4333.001.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4198) CapacityScheduler locking / synchronization improvements

2015-12-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060459#comment-15060459
 ] 

Carlo Curino commented on YARN-4198:


[~xinxianyin] the way we got to this was by running a "busy" workload with lots 
of reservation-related pressure to the CS, staring at a profiler and 
progressively work out what locks could be weakened, which data structures 
could be changed to improve the performance of the scheduler. 

I think this is looking at the same set of problems you are tracked in 
YARN-3091 but with a particular focus on the needs of the reservation system. I 
expect the changes in this patch (we will post an initial version soon), to be 
generally useful, and possibly partially overlapping some of YARN-3091 
sub-JIRAs. 

The improvements we observed were very substantial (we went from thrashing on 
locks in a 256 nodes cluster at 50-60 concurrent reservations to jug along 
nicely on 2700 nodes cluster at over 1000 concurrent reservations). Note that 
all that testing was done for this patch combined with the rest of YARN-4193 
work, therefore I suggest that:
 # We will do a round of tests of this patch in isolation to make sure the 
changes are good independently of the rest of what we did in YARN-4193.
 # Post a version of the patch. 
 # You can review it and help us figure out whether: 1) it is 
good/safe/agreeable, 2) how it relates with some of the other efforts that are 
ongoing (might resolve some of the sub-JIRAs or provide partial work towards 
them). 

[~kshukla], [~wangda], [~jianhe], [~jlowe] if you guys have time to look at 
this as well, it would be great. As I mentioned to some of you already, this is 
a very delicate portion of the scheduler, and we need lots of eyes (ideally 
both staring at the patch and testing independently on a cluster) to convince 
ourselves that what is proposed is safe/correct and worth. 
 

> CapacityScheduler locking / synchronization improvements
> 
>
> Key: YARN-4198
> URL: https://issues.apache.org/jira/browse/YARN-4198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
>
> In the context of YARN-4193 (which stresses the RM/CS performance) we found 
> several performance problems with  in the locking/synchronization of the 
> CapacityScheduler, as well as inconsistencies that do not normally surface 
> (incorrect locking-order of queues protected by CS locks etc). This JIRA 
> proposes several refactoring that improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060453#comment-15060453
 ] 

Hadoop QA commented on YARN-4423:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4423 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778064/YARN-4423.003.patch |
| JIRA Issue | YARN-4423 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10009/console |


This message was automatically generated.



> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch, 
> YARN-4423.003.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4461) Redundant nodeLocalityDelay log in LeafQueue

2015-12-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060466#comment-15060466
 ] 

Wangda Tan commented on YARN-4461:
--

Thanks [~eepayne], patch looks good, +1.

> Redundant nodeLocalityDelay log in LeafQueue
> 
>
> Key: YARN-4461
> URL: https://issues.apache.org/jira/browse/YARN-4461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Trivial
> Attachments: YARN-4461.001.patch
>
>
> In LeafQueue#setupQueueConfigs there's a redundant log of nodeLocalityDelay:
> {code}
> "nodeLocalityDelay = " + nodeLocalityDelay + "\n" +
> "labels=" + labelStrBuilder.toString() + "\n" +
> "nodeLocalityDelay = " +  nodeLocalityDelay + "\n" +
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060559#comment-15060559
 ] 

Bikas Saha commented on YARN-1197:
--

The API supports it but the backed implementation does not. So in the future, 
based on need, this could be supported compatibly. Do you have a scenario where 
this is essential?

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4423) Cleanup lint warnings in resource mananger

2015-12-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-4423:
---
Attachment: (was: YARN-4423.003.patch)

> Cleanup lint warnings in resource mananger
> --
>
> Key: YARN-4423
> URL: https://issues.apache.org/jira/browse/YARN-4423
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-4423.001.patch, YARN-4423.002.patch
>
>
> There are multiple lint warnings about unchecked usage.  This JIRA is to 
> clean them up, and maybe a few other quibbles as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-12-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4003:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4193

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> 
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060568#comment-15060568
 ] 

Junping Du commented on YARN-4452:
--

I just committed 002 patch to trunk and branch-2. However, I meet some 
conflict/build error in backporting to 2.6. Naga, can you put a patch for 2.6? 
Thanks!

> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4452.v1.001.patch, YARN-4452.v1.002.patch
>
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4462:
---
Summary: FairScheduler: Disallow preemption from a queue  (was: Scheduler 
should prevent certain application from being preempted)

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-16 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060909#comment-15060909
 ] 

Karthik Kambatla commented on YARN-4462:


Today, FairScheduler allows disabling preemption *for* a queue. It makes sense 
to add the ability to disable preemption *from* a queue as well. 

I have a couple of other FairScheduler items I am working on before I am able 
to get to this. Let me know if anyone is interested in working on this. 

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Karthik Kambatla
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-16 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4428:
---
Attachment: YARN-4428.1.2.patch

.1.2 patch addressed checkstyle issues

> Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
> -
>
> Key: YARN-4428
> URL: https://issues.apache.org/jira/browse/YARN-4428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch
>
>
> When AHS is turned on, if we can't view application in RM page, RM page 
> should redirect us to AHS page. For example, when you go to 
> cluster/app/application_1, if RM no longer remember the application, we will 
> simply get "Failed to read the application application_1", but it will be 
> good for RM ui to smartly try to redirect to AHS ui 
> /applicationhistory/app/application_1 to see if it's there. The redirect 
> usage already exist for logs in nodemanager UI.
> Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on 
> fall back of RM not remembering the app. YARN-3975 tried to do this only when 
> original tracking url is not set. But there are many cases, such as when app 
> failed at launch, original tracking url will be set to point to RM page, so 
> redirect to AHS page won't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-4462:
--

Assignee: Karthik Kambatla

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Karthik Kambatla
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060978#comment-15060978
 ] 

Jian He commented on YARN-3480:
---

thanks for updating, few more comments:
- rename startAttemptIdInStateStore to firstAttemptIdInStore
- I think below can be simplified to one line 
{{app.rmContext.getStateStore().removeApplicationAttempt(attemptId);}} and the 
removeAppAttemptFromStateStore method is not needed
{code}
   RMAppAttempt oldestAttempt = app.getRMAppAttempt(attemptId);
if (oldestAttempt != null) {
  removeAppAttemptFromStateStore(app, oldestAttempt);
}
  {code}
  - the currentAttemptId is actually the nextAttemptId, which is confusing. 
Could you change the logic to actually be currentAttemptId ?
 - could you add test case in RMStateStoreTestBase for the remove attempt ?

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, YARN-3480.09.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-12-16 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061124#comment-15061124
 ] 

Brook Zhou commented on YARN-3223:
--

Test breaks unrelated.

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch, YARN-3223-v3.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-16 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060892#comment-15060892
 ] 

Subru Krishnan commented on YARN-2575:
--

Thanks [~seanpo03] for taking this up, this is an important patch.

I took a look at your patch and have a few suggestions to simplify it:
  * Let us move the *ReservationACLs* to the *ReservationSystem* and add a 
getter. With this approach, we need not change the _RM_ or _ClientRMService_ 
constructor, a non-trivial chunk of the patch is addressing the resulting 
conflicts.
  * We should add an explicit ACL for *LIST_RESERVATIONS* to avoid the 
convoluted implied if checks.
  * I feel that we can replace the checks in the _Scheduler/Queue_ hierarchy 
with one in the *ClientRMService::submitApp*.
  * The changes in the scheduler xml files are not needed as the 
_ReservationSystem_ is not enabled by default yet.
  * In *ACLsTestBase*, you can refer to 
*TestReservationSystemWithRMHA::addNodeCapacityToPlan* for draining the 
dispatcher to ensure node is registered with RM.

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-16 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: YARN-4414.1.3.patch

oops, my bad, intended to name latest patch as .1.3. 
removed the .2.2 patch and re-upload the latest as .1.3

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.3.patch, YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-16 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: (was: YARN-4414.2.2.patch)

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4461) Redundant nodeLocalityDelay log in LeafQueue

2015-12-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061104#comment-15061104
 ] 

Jason Lowe commented on YARN-4461:
--

+1 committing this.

> Redundant nodeLocalityDelay log in LeafQueue
> 
>
> Key: YARN-4461
> URL: https://issues.apache.org/jira/browse/YARN-4461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Trivial
> Attachments: YARN-4461.001.patch
>
>
> In LeafQueue#setupQueueConfigs there's a redundant log of nodeLocalityDelay:
> {code}
> "nodeLocalityDelay = " + nodeLocalityDelay + "\n" +
> "labels=" + labelStrBuilder.toString() + "\n" +
> "nodeLocalityDelay = " +  nodeLocalityDelay + "\n" +
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060935#comment-15060935
 ] 

Hudson commented on YARN-4225:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8978 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8978/])
YARN-4225. Add preemption status to yarn queue -status for capacity (wangda: 
rev 7faa406f27f687844c941346f59a27a375af3233)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/QueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/QueueCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/QueueInfoPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060936#comment-15060936
 ] 

Hudson commented on YARN-4416:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8978 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8978/])
YARN-4416. Deadlock due to synchronised get Methods in AbstractCSQueue. 
(wangda: rev 9b856d9787be5ec88ef34574b9b98755d7b669ea)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060937#comment-15060937
 ] 

Hudson commented on YARN-4293:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8978 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8978/])
YARN-4293. ResourceUtilization should be a part of yarn node CLI. (Sunil 
(wangda: rev 79c41b1d83e981ae74cb8b58ffcf7907b7612ad4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/ResourceUtilizationPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/NodeStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/ResourceUtilization.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitorImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceUtilization.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceUtilizationInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/records/impl/pb/NodeStatusPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceUtilizationPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnClusterNodeUtilization.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeResourceMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeReport.java
* 

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060952#comment-15060952
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} Patch generated 18 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 267, now 275). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 140m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 

[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061126#comment-15061126
 ] 

Jian He commented on YARN-4438:
---

Thanks for detailed review !

bq. All that is needed is store the CuratorFramework instance in RMContext.
Actually, I need to refactor out the zkClient creation logic from 
ZKRMStateStore as the zkClient is requiring a bunch of other configs. And 
because ZKRMStateStore is currently in active service, it cannot be simply 
moved to AlwaysOn service. So, I'd like to do it separately to minimize the 
core change in this jira.
bq. The instance, rm, is not used anywhere. Why even pass it?
I was ealier directly calling rm.transitionToActive instead of calling 
AdminService#transitionToActive. But just to minimize the change and keep it 
consistent with EmbeddedElectorService, I changed to call 
AdminService#transitionToActive. 
The only extra thing AdminService does is to refresh the ACLs. Suppose the 
shared storage based configraion provider is not enabled(which is the most 
usual case), why do we need to call refresh the configs? It cannot read the 
remote RM's config anyway. Without calling these refresh calls, we can avoid 
bugs like YARN-3893. Also, RM itself does not need to depend on the AdminACl 
for it to transition to active/standby. It should always has the permission to 
do that. I'd like to change this part for RM to not refresh the configs if 
shared storage based config provider is not enabled. 

bq. why sleep for 1 second
To avoid a busy loop and rejoining immediately. That's what 
ActiveStandbyElector does too. It could be more than 1s. I don't think we need 
one more config for this.

bq. If it is due to close(), don't we want to force give-up so the other RM 
becomes active?  If it is on initAndStartLeaderLatch(), this RM will never 
become active; don't we want to just die?
What do you mean by force give-up ? exit RM ?
The underlying curator implementation will retry the connection in background, 
even though the exception is thrown. See Guaranteeable interface in Curator. I 
think exit RM is too harsh here. Even though RM remains at standby, all 
services should be already shutdown, so there's no harm to the end users ?

I have one question about ActiveStandbyCheckThread.  if we make zkStateStore 
and elector to share the same zkClient, do we still need the 
ActiveStandbyCheckThread ? the elector itself should get notification when the 
connection is lost.

bq. notLeader: Again, we should likely do more than just logging.
This is currently what EmbeddedElectorService is doing. If the leadership is 
already lost from zk's perspective, the other RM should take up the leadership
 
bq. How about adding a method called closeLeaderLatch to complement 
initAndStart? That would help us avoid cases like the null check missing in 
rejoinElection?
I think leaderLatch could never be null ?
 
bq.  may be we should have a config to use embedded-elector instead of 
curator-elector e.g. yarn.resourcemanager.ha.use-active-standby-elector
This flag is just a temporary thing, a lot of test cases need to be changed 
without this flag. I plan to remove this flag and the embeddedElector code too 
in followup.

bq. Why change the argument to transitionToStandby from true to false? in the 
following method, reinitialize(initialize) should be called outside the if. No?
Why does it need to be called outside of {{if (state == 
HAServiceProtocol.HAServiceState.ACTIVE)}} ? This is a fresh start, it does not 
need to call reinitiialize.

bq. still feel the AdminService should be the one handling the 
LeaderElectorService. Also, the LeaderElectorService talks to AdminService for 
transitions to active/standby.
Currently, AdminService does not depend on EmbeddedLeaderElector at all. All it 
does is to initialize EmbeddedElectorService. May be the elector does not need 
to depend on AdminService too, i.e. not need to refresh the acls if shared 
storage based config provider is not enabled.

Will update other comments accordingly.

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4462) Scheduler should prevent certain application from being preempted

2015-12-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4462:
---
Component/s: (was: scheduler)
 fairscheduler

> Scheduler should prevent certain application from being preempted
> -
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2015-12-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4464:
---
Target Version/s: 3.0.0
Priority: Blocker  (was: Major)
Hadoop Flags: Incompatible change

Agree that the value is too high in branch-2. We advise our customers to lower 
this all the time. However, changing it in branch-2 would be incompatible. I 
think we should do this only on trunk; making it a blocker so we don't miss it. 

How about setting the default to 0? 

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Priority: Blocker
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2015-12-16 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-4464:
--

Assignee: Daniel Templeton

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Assignee: Daniel Templeton
>Priority: Blocker
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-16 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: YARN-4414.2.2.patch

.2.2 fix the white space issue

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.patch, YARN-4414.2.2.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-16 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061094#comment-15061094
 ] 

Varun Saxena commented on YARN-3586:


Thanks Junping for the patch.
Overall the patch looks good.

A couple of nits :
# Do we need changes in MockNodes ? I do not see runningApps being updated 
anywhere.
Even MockRMApp changes do not seem to be required for the test case added. But 
anyways we have to override these methods so should be fine
# Comment added above the debug log (// Log a debug info if collector address 
is not found.) is not required. I guess debug log itself is self explanatory.

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4461) Redundant nodeLocalityDelay log in LeafQueue

2015-12-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061170#comment-15061170
 ] 

Hudson commented on YARN-4461:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8980 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8980/])
YARN-4461. Redundant nodeLocalityDelay log in LeafQueue. Contributed by (jlowe: 
rev 91828fef6b9314f72d1f973f00e81404dc6bba91)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> Redundant nodeLocalityDelay log in LeafQueue
> 
>
> Key: YARN-4461
> URL: https://issues.apache.org/jira/browse/YARN-4461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4461.001.patch
>
>
> In LeafQueue#setupQueueConfigs there's a redundant log of nodeLocalityDelay:
> {code}
> "nodeLocalityDelay = " + nodeLocalityDelay + "\n" +
> "labels=" + labelStrBuilder.toString() + "\n" +
> "nodeLocalityDelay = " +  nodeLocalityDelay + "\n" +
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase

2015-12-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061212#comment-15061212
 ] 

Li Lu commented on YARN-4445:
-

All failing system tests passed on my local machine. +1. Since it has been 
hanging here for about a day, I'm committing this shortly. 

> Unify the term flowId and flowName in timeline v2 codebase
> --
>
> Key: YARN-4445
> URL: https://issues.apache.org/jira/browse/YARN-4445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Zhan Zhang
>  Labels: refactor
> Attachments: YARN-4445-feature-YARN-2928.001.patch, YARN-4445.patch
>
>
> Flow names are not sufficient to identify a flow. I noticed we used both 
> "flowName" and "flowId" to point to the same thing. We need to unify them to 
> flowName. Otherwise, front end users may think flow id is a top level concept 
> and try to directly locate a flow by its flow id.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4461) Redundant nodeLocalityDelay log in LeafQueue

2015-12-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061298#comment-15061298
 ] 

Eric Payne commented on YARN-4461:
--

Thanks a lot, [~jlowe] and [~leftnoteasy]!

> Redundant nodeLocalityDelay log in LeafQueue
> 
>
> Key: YARN-4461
> URL: https://issues.apache.org/jira/browse/YARN-4461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: YARN-4461.001.patch
>
>
> In LeafQueue#setupQueueConfigs there's a redundant log of nodeLocalityDelay:
> {code}
> "nodeLocalityDelay = " + nodeLocalityDelay + "\n" +
> "labels=" + labelStrBuilder.toString() + "\n" +
> "nodeLocalityDelay = " +  nodeLocalityDelay + "\n" +
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061296#comment-15061296
 ] 

Eric Payne commented on YARN-4225:
--

Thanks a lot, [~leftnoteasy]

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4463) Container launch failure when yarn.nodemanager.log-dirs directory path contains space

2015-12-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061472#comment-15061472
 ] 

Rohith Sharma K S commented on YARN-4463:
-

 I am no sure does it works, have you tried configuring by escaping the space? 

> Container launch failure when yarn.nodemanager.log-dirs directory path 
> contains space
> -
>
> Key: YARN-4463
> URL: https://issues.apache.org/jira/browse/YARN-4463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> If the container log directory path contains space container-launch fails
> Even with DEBUG logs are enabled only log able to get is 
> {noformat}
> Container id: container_e32_1450233925719_0009_01_22
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:912)
> at org.apache.hadoop.util.Shell.run(Shell.java:823)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1102)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We can support container-launch to support nmlog directory path with space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4195:
---
Attachment: YARN-4195.patch

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-12-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061563#comment-15061563
 ] 

Carlo Curino commented on YARN-4003:


[~Sunil G], I think your understanding is correct. The point is that the AM 
percent limits for a PlanQueue are mostly clashing with the highly dynamic 
nature of sub-queues being created/resized/destroyed dynamically. In our 
experiments it was never useful, but sometimes it prevented jobs to start. I 
don't think the proposed patch is perfect (for the reasons you pointed out), 
but at least it allows jobs to go through. Do you have in mind any more 
principled way to set this?


> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> 
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4454) NM to nodelabel mapping going wrong after RM restart

2015-12-16 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061476#comment-15061476
 ] 

Bibin A Chundatt commented on YARN-4454:


On 2nd time recovery the ordering is going wrong 

{noformat}
2015-12-14 17:17:54,906 INFO 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
2015-12-14 17:17:54,906 INFO 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   
NM=host-10-19-92-188:0, labels=[ResourcePool_null]{noformat}




> NM to nodelabel mapping going wrong after RM restart
> 
>
> Key: YARN-4454
> URL: https://issues.apache.org/jira/browse/YARN-4454
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> *Steps to reproduce*
> 1.Create cluster with 2 NM
> 2.Add label X,Y to cluster
> 3.replace  Label of node  1 using ,x
> 4.replace label for node 1 by ,y
> 5.Again replace label of node 1 by ,x
> Check cluster label mapping HOSTNAME1 will be mapped with X 
> Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y
> {noformat}
> 2015-12-14 17:17:54,901 INFO 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 
> 

[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2015-12-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061477#comment-15061477
 ] 

Sunil G commented on YARN-4164:
---

Thanks [~rohithsharma]  for updating the patch. Looks good. [~jianhe],  could 
you pls help to take a look also. 

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2934) Improve handling of container's stderr

2015-12-16 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2934:

Attachment: YARN-2934.v1.007.patch

Thanks for the review [~jira.shegalov],
bq. Please make sure that the patch does not introduce new problems. Both 
checkstyle and findbugs report problems related to the patch. 
Well, was earlier not sure whether the findbugs issue reported was valid one, 
went through the description given in the finds bugs report got the 
understanding about it!  Thought of correcting the synchronization issue but 
based on your last comment made it to fetch everytime from the conf, in the 
latest patch.
Most of the issues reported in the checkstyle are not directly induced by the 
patch or is ok to live with. Valid issues are incorporated in the latest patch

bq. I'd rather have a longer config value than adding more code to make 
patterns case-insensitive. In practice we mostly need stderr
Would like to differ here, IMHO code added is not much and anyway i have 
already finished coding it and when there is a way possible to avoid 
configuring multiple cases, then why expect the user to configure for both 
cases ? Though most cases {{stderr}} is sufficient, its like any kind of app 
can be submitted and the pattern can be mix and match of the cases too, ex . in 
distributed shell client for am log {{"AppMaster.stderr"}} is used as the error 
file name (emphasizing different cases here though it has stderr).
I am open to modify to glob approach if there is any flaw/disadvantage in the 
current approach !

 bq. In general, don't try optimize for the failure case. Things like look like 
a bug. Simply get it from conf exactly when it's needed.
Ok have corrected for the tail size and in the similar lines for {{pattern}} 
too, in the latest patch.

> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch, YARN-2934.v1.007.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061562#comment-15061562
 ] 

Carlo Curino commented on YARN-4195:


Regarding clean-up: [~subru], I will need some help to make the new 
node-label-aware {{InMemoryReservationAllocation}} to play nice with HA (we 
have a couple of test failures on this). 

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4197) Modify PlanFollower to handle reservation with node-labels

2015-12-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4197:
---
Assignee: Alexey Tumanov

> Modify PlanFollower to handle reservation with node-labels
> --
>
> Key: YARN-4197
> URL: https://issues.apache.org/jira/browse/YARN-4197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
>
> This JIRA tracks all the changes needed in the PlanFollower(s) 
> to handle multiple node-labels, and properly publish reservation to the 
> underlying scheduler, as well as gathering from the NodeLabelsManager the
> availability of resources per-label and inform the Plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4196) Enhance ReservationAgent(s) to support node-label

2015-12-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-4196:
---
Assignee: Ishai Menache  (was: Carlo Curino)

> Enhance ReservationAgent(s) to support node-label
> -
>
> Key: YARN-4196
> URL: https://issues.apache.org/jira/browse/YARN-4196
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Ishai Menache
>
> As part of umbrella jira YARN-4193 we need to extend the algorithm that place
> a ReservationRequest in a Plan to handle multiple node-labels (and 
> expressions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4196) Enhance ReservationAgent(s) to support node-label

2015-12-16 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino reassigned YARN-4196:
--

Assignee: Carlo Curino

> Enhance ReservationAgent(s) to support node-label
> -
>
> Key: YARN-4196
> URL: https://issues.apache.org/jira/browse/YARN-4196
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>
> As part of umbrella jira YARN-4193 we need to extend the algorithm that place
> a ReservationRequest in a Plan to handle multiple node-labels (and 
> expressions).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061577#comment-15061577
 ] 

sandflee commented on YARN-1197:


seems complicated for AM to do this,  especially we added disk,network to 
container resouces

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2015-12-16 Thread KWON BYUNGCHANG (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061481#comment-15061481
 ] 

KWON BYUNGCHANG commented on YARN-4464:
---

0 is good :)

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Assignee: Daniel Templeton
>Priority: Blocker
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-16 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-2575:
--
Attachment: YARN-2575.v2.patch

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch, YARN-2575.v2.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-16 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061560#comment-15061560
 ] 

Carlo Curino commented on YARN-4195:


The posted patch is a rebase of our working branch. It is useful to kickstart 
reviewing/discussion, but probably not ready to commit as-is.

In general, the patch adds the support of tracking per-label resource 
allocations in the ReservationSystem (and performs a bit of cleanups).

The one key point worth discussing is related to partitions vs labels:
 # to ensure that capacity per node label means anything both queues in the CS 
(and reservations here) are forced to refer to "partitions" of the underlying 
nodes. 
 # this is annoying and limiting. For example if I have GPU and PUBLICIP as my 
desired user-visible labels, I would have to manually express all 4 
combinations GPU_PUBLICIP, GPU_NOT-PUBLICIP, NOT-GPU_PUBLICIP, 
NOT-GPU_NOT-PUBLICIP when defining queues, and worst, the users will be forced 
to express their needs in the same form. For example a user that only cares 
about running on {{GPU}} would have to say: {{GPU_PUBLICIP OR 
GPU_NOT-PUBLICIP}}. 
 # we propose an improvement: 
 ## Internally the system tracks partitions. 
 ## administrators configuring queues do so at the partition level
 ## users are allowed to express their job needs in terms of labels (and the 
system internally converts this into partitions)
 ## users can reserve in terms of labels (and the system internally converts 
this into partitions)

It is generally provable that an arbitrary expression of labels can be 
represented as an OR of partitions (or in disjunctive-normal-form). In this and 
future patches we have a version of this using a mix of JEXL and ad-hoc 
accelerations (JEXL was too slow in some of our tests). However, I believe  
[~chris.douglas] has a better version of this, which uses a cool algorithm 
which skips the conversion to DNF. We should plug these here.

More generally, this improvement can be of general use for CS (and in the 
future FS) to expose a nicer API to users.

(Once again, marking patch as ready to get the conversation going, some cleanup 
still required) 
 

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061260#comment-15061260
 ] 

Naganarasimha G R commented on YARN-4452:
-

will provide for it shortly !

> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4452.v1.001.patch, YARN-4452.v1.002.patch
>
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061301#comment-15061301
 ] 

Hadoop QA commented on YARN-3458:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 31s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
29s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 56s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775355/YARN-3458-9.patch |
| JIRA Issue | YARN-3458 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b7023ebed024 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 3c0adac |
| findbugs | v3.0.0 |
| JDK v1.7.0_91  Test Results | 

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061393#comment-15061393
 ] 

MENG DING commented on YARN-1197:
-

[~sandflee], for now you can achieve the goal of increasing and decreasing 
different resource indices by sending separate resource change requests, with 
each request only changing one index.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061407#comment-15061407
 ] 

Hadoop QA commented on YARN-4428:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 1s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server (total was 162, now 164). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 15s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 103m 46s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 35s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 280m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 

[jira] [Assigned] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned YARN-3458:
---

Assignee: Chris Nauroth  (was: Inigo Goiri)

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-16 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061280#comment-15061280
 ] 

Tao Jie commented on YARN-4462:
---

hi [~kasha]
I have done some attempt of this on FairScheduler based on 2.6.0, and I would 
like to work on this.

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Karthik Kambatla
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3458:

Assignee: Inigo Goiri  (was: Chris Nauroth)

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061347#comment-15061347
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061186#comment-15061186
 ] 

Li Lu commented on YARN-4224:
-

Discussed about the PUT use case with Wangda. Right now we're not planning any 
write use case for the web UI, especially when we assume all data comes from 
the timeline reader server. Therefore, let's focus on the GET operations and 
make sure those endpoints are right. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-12-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-3458:

Hadoop Flags: Reviewed

+1 for patch v9.  I'll wait a few days before committing, since I see other 
watchers on the issue.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4257) Move scheduler validateConf method to AbstractYarnScheduler and make it protected

2015-12-16 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated YARN-4257:
-
Attachment: YARN-4257.patch

> Move scheduler validateConf method to AbstractYarnScheduler and make it 
> protected
> -
>
> Key: YARN-4257
> URL: https://issues.apache.org/jira/browse/YARN-4257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Swapnil Daingade
>Assignee: Rich Haase
>  Labels: easyfix
> Attachments: YARN-4257.patch
>
>
> Currently FairScheduler, CapacityScheduler and FifoScheduler each have a 
> method private void validateConf(Configuration conf).
> All three methods validate the minimum and maximum scheduler allocations for 
> cpu and memory (with minor difference). FairScheduler supports 0 as minimum 
> allocation for cpu and memory, while CapacityScheduler and FifoScheduler do 
> not. We can move this code to AbstractYarnScheduler (avoids code duplication) 
> and make it protected for individual schedulers to override.
> Why do we care about a minimum allocation of 0 for cpu and memory?
> We contribute to a project called Apache Myriad that run yarn on mesos. 
> Myriad supports a feature call fine grained scaling (fgs). In fgs, a NM is 
> launched with zero capacity (0 cpu and 0 mem). When a yarn container is to be 
> run on the NM, a mesos offer for that node is accepted and the NM capacity is 
> dynamically scaled up to match the accepted mesos offer. On completion of the 
> yarn container, resources are returned back to Mesos and the NM capacity is 
> scaled down back to zero (cpu & mem). 
> In ResourceTrackerService.registerNodeManager, yarn checks if the NM capacity 
> is at-least as much as yarn.scheduler.minimum-allocation-mb and 
> yarn.scheduler.minimum-allocation-vcores. These values can be set to 0 in 
> yarn-site.xml (so a zero capacity NM is possible). However, the validateConf 
> methods in CapacityScheduler and FifoScheduler do not allow for 0 values for 
> these properties (The FairScheduler one does allow for 0). This behaviour 
> should be consistent or at-least be override able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061348#comment-15061348
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061349#comment-15061349
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061346#comment-15061346
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4462) FairScheduler: Disallow preemption from a queue

2015-12-16 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-4462:
--

Assignee: Tao Jie  (was: Karthik Kambatla)

All yours, [~Tao Jie]. 

> FairScheduler: Disallow preemption from a queue
> ---
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops

2015-12-16 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-2575:
--
Attachment: (was: YARN-2575.v2.patch)

> Consider creating separate ACLs for Reservation create/update/delete/list ops
> -
>
> Key: YARN-2575
> URL: https://issues.apache.org/jira/browse/YARN-2575
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sean Po
> Attachments: YARN-2575.v1.patch
>
>
> YARN-1051 introduces the ReservationSystem and in the current implementation 
> anyone who can submit applications can also submit reservations. This JIRA is 
> to evaluate creating separate ACLs for Reservation create/update/delete ops.
> Depends on YARN-4340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-16 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059829#comment-15059829
 ] 

Sidharta Seethana commented on YARN-3542:
-

Thanks, [~vvasudev]. +1 on the latest patch from me.

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch, 
> YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, 
> YARN-3542.006.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4445) Unify the term flowId and flowName in timeline v2 codebase

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059904#comment-15059904
 ] 

Hadoop QA commented on YARN-4445:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
16s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
59s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s 
{color} | {color:red} Patch generated 9 new checkstyle issues in root (total 
was 147, now 154). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 49s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 54s {color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 33s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 51s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 11s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 

[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-16 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059964#comment-15059964
 ] 

Jun Gong commented on YARN-3480:


Fix findbugs and test errors.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, YARN-3480.09.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2934) Improve handling of container's stderr

2015-12-16 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059943#comment-15059943
 ] 

Gera Shegalov commented on YARN-2934:
-

Hi [~Naganarasimha],
Please make sure that the patch does not introduce new problems. Both 
checkstyle and findbugs report problems related to the patch. Check the Hadoop 
QA comment above. Keep addressing the newly introduced issues without waiting 
for review to simplify the review process. 

I suggest to use globs instead of regexes, so you can simply call 
FileSystem#globStatus. The path pattern could be something like 
{code}{*stderr*,*STDERR*}{code} or maybe {code}{*err,*ERR,*out,*OUT}{code}. I'd 
rather have a longer config value than adding more code to make patterns 
case-insensitive. In practice we mostly need stderr

Not sure how fancy we need to be with the case where multiple log files qualify 
for the pattern, but maybe at least mention to the user there are more files to 
look at. 

In general, don't try optimize for the failure case. Things like
{code}
private static long tailSizeInBytes = -1;
{code}
look like a bug. Simply get it from conf exactly when it's needed.


> Improve handling of container's stderr 
> ---
>
> Key: YARN-2934
> URL: https://issues.apache.org/jira/browse/YARN-2934
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Gera Shegalov
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2934.v1.001.patch, YARN-2934.v1.002.patch, 
> YARN-2934.v1.003.patch, YARN-2934.v1.004.patch, YARN-2934.v1.005.patch, 
> YARN-2934.v1.006.patch
>
>
> Most YARN applications redirect stderr to some file. That's why when 
> container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-16 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3480:
---
Attachment: YARN-3480.09.patch

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, YARN-3480.09.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4333) Fair scheduler should support preemption within queue

2015-12-16 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059968#comment-15059968
 ] 

Tao Jie commented on YARN-4333:
---

In attached patch, we append app-level check for starvation stats to 
queue-level check in fairscheduler update thread. Only starvation due to 
fairShare is checked for app, since we don't have minShare for app. Also we 
record lastTimeAtFairShareThreshold for each appAttempt.
Now once one app is starving but the leaf queue is satisfied, it could also 
trigger preemption. 

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
> Attachments: YARN-4333.001.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4333) Fair scheduler should support preemption within queue

2015-12-16 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated YARN-4333:
--
Attachment: YARN-4333.001.patch

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
> Attachments: YARN-4333.001.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) Scheduler should prevent certain application from being preempted

2015-12-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060039#comment-15060039
 ] 

Sunil G commented on YARN-4462:
---

Yes Tao Jie. I understood your scenaruo where cost of one long running 
application is more. YARN-4108 comes up with this improvement. There will be 
policies which user can configure to selected app. This will surely cover your 
case. 

> Scheduler should prevent certain application from being preempted
> -
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4461) Redundant nodeLocalityDelay log in LeafQueue

2015-12-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060109#comment-15060109
 ] 

Eric Payne commented on YARN-4461:
--

The two failing tests above ({{TestClientRMTokens}} and 
{{TestAMAuthorization}}) both work for me in my local environment.

> Redundant nodeLocalityDelay log in LeafQueue
> 
>
> Key: YARN-4461
> URL: https://issues.apache.org/jira/browse/YARN-4461
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Trivial
> Attachments: YARN-4461.001.patch
>
>
> In LeafQueue#setupQueueConfigs there's a redundant log of nodeLocalityDelay:
> {code}
> "nodeLocalityDelay = " + nodeLocalityDelay + "\n" +
> "labels=" + labelStrBuilder.toString() + "\n" +
> "nodeLocalityDelay = " +  nodeLocalityDelay + "\n" +
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060119#comment-15060119
 ] 

Naganarasimha G R commented on YARN-4452:
-

Hi [~djp], thanks for sharing your views.
bq. But may be we can consider to add some unexpectedExceptionHandler to some 
non-critical component (like metrics) so NPE or other exceptions on these 
component won't be necessary to bring down RM.  We can have a separate JIRA to 
fix it if you don't want to address here.
+1 for this approach but I believe it can be done in other jira (as this 
critical jira), if you are fine will create a new jira and work in that. 

> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4452.v1.001.patch, YARN-4452.v1.002.patch
>
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4462) Scheduler should prevent certain application from being preempted

2015-12-16 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060133#comment-15060133
 ] 

Naganarasimha G R commented on YARN-4462:
-

hi [~Tao Jie],
bq. However we apply FairScheduler in our system. I would like to bring 
queue-level-disable-preemption into FairScheduler.
I am not sure about similar support in FS. May be you can update the title 
accordingly so that right person can comment on it. 

> Scheduler should prevent certain application from being preempted
> -
>
> Key: YARN-4462
> URL: https://issues.apache.org/jira/browse/YARN-4462
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>
> When scheduler preemption is enabled, applications could be preempted if they 
> obtain resource over they should take. 
> When a mapreduce application is preempted some resource, it just runs slower. 
> However, when the preempted application is a long-run service, such as tomcat 
> running in slider, the service would fail.
> So we should have a flag for application to indicate the scheduler that those 
> application should not be preempted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059828#comment-15059828
 ] 

Hadoop QA commented on YARN-3480:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 29s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Load of known null value in 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedTransition.removeExcessAttempts(RMAppImpl)
  At RMAppImpl.java:in 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedTransition.removeExcessAttempts(RMAppImpl)
  At RMAppImpl.java:[line 1363] |
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| 

[jira] [Created] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2015-12-16 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4465:
--

 Summary: SchedulerUtils#validateRequest for Label check should 
happen only when nodelabel enabled
 Key: YARN-4465
 URL: https://issues.apache.org/jira/browse/YARN-4465
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Disable label from rm side yarn.nodelabel.enable=false
Capacity scheduler label configuration for queue is available as below
default label for queue = b1 as 3 and accessible labels as 1,3
Submit application to queue A .

{noformat}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
 Invalid resource request, queue=b1 doesn't have permission to access all 
labels in resource request. labelExpression of resource request=3. Queue 
labels=1,3
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)

{noformat}


# Ignore default label expression when label is disabled *or*
# NormalizeResourceRequest we can set label expression to  
when node label is not enabled *or*
# Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >