[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057006#comment-15057006
 ] 

Hudson commented on YARN-4439:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8967 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8967/])
YARN-4439. Clarify NMContainerStatus#toString method. Contributed by (xgong: 
rev d8a45425eba372cdebef3be50436b6ddf1c4e192)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java


> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.3
>
> Attachments: YARN-4439.1.patch, YARN-4439.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057233#comment-15057233
 ] 

Sunil G commented on YARN-3226:
---

Test case failures are known and not related to this patch. [~djp], 
[~rohithsharma]  kindly help to check the same. 

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-14 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057119#comment-15057119
 ] 

Xuan Gong commented on YARN-4439:
-

Committed into branch-2.8

> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.3
>
> Attachments: YARN-4439.1.patch, YARN-4439.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2015-12-14 Thread Matthew Paduano (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Paduano updated YARN-4435:
--
Attachment: proposed_solution

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
> Attachments: proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2015-12-14 Thread Matthew Paduano (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Paduano updated YARN-4435:
--
Attachment: (was: proposed_solution)

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
> Attachments: proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057037#comment-15057037
 ] 

Hudson commented on YARN-4418:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/])
YARN-4418. AM Resource Limit per partition can be updated to (wangda: rev 
07b0fb996a32020678bd2ce482b672f0434651f0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java


> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057039#comment-15057039
 ] 

Hudson commented on YARN-3946:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/])
YARN-3946. Update exact reason as to why a submitted app is in ACCEPTED 
(wangda: rev 6cb0af3c39a5d49cb2f7911ee21363a9542ca2d7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimitsByPartition.java


> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057034#comment-15057034
 ] 

Hudson commented on YARN-4309:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/])
YARN-4309. Add container launch related debug information to container (wangda: 
rev dfcbbddb0963c89c0455d41223427165b9f9e537)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add container launch related debug information to container logs when a 
> container fails
> ---
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057106#comment-15057106
 ] 

Jian He edited comment on YARN-4138 at 12/15/15 1:06 AM:
-

{code}
SchedContainerChangeRequest decreaseRequest =
new SchedContainerChangeRequest(
schedulerNode, rmContainer,
rmContainer.getLastConfirmedResource());
decreaseContainer(decreaseRequest,
getCurrentAttemptForContainer(containerId));
{code}
- this scenario may cause resource accounting wrong,  correct me if I'm wrong:
1) AM asks increase 2G -> 8G
2) AM does not increase the container, and asks decrease to 1G
3) LastConfirmedResource becomes 1G
4) In the meantime, containerIncreaseExpiration logic is triggered and 
rollbackContainerResource is invoked. In this case the resource Delta becomes 
positive even in decrease case, but some code is assuming the decrease to be 
negative, which may cause resource accounting wrong ?
{code}
// Delta capacity is negative when it's a decrease request
Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity());
{code}
-  I have a question about the API semantics for the above mentioned scenario. 
According to the AMRMClient#requestContainerResourceChange API, the previous 
pending resource-change-request should be cancelled.  Essentially, the 
semantics is a setter API.  In that sense, the previous 8G should be cancelled. 
With this approach, both resource-change-requests are cancelled. That is,10 min 
later after the expiration is triggered, user will suddenly see its container 
decreased to 2 GB. will this confuse the user ?
- revert format only changes in RMContainerChangeResourceEvent




was (Author: jianhe):
{code}
SchedContainerChangeRequest decreaseRequest =
new SchedContainerChangeRequest(
schedulerNode, rmContainer,
rmContainer.getLastConfirmedResource());
decreaseContainer(decreaseRequest,
getCurrentAttemptForContainer(containerId));
{code}
- this scenario may cause resource accounting wrong,  correct me if I'm wrong:
1) AM asks increase 2G -> 8G
2) AM does not increase the container, and asks decrease to 1G
3) LastConfirmedResource becomes 1G
4) In the meantime, containerIncreaseExpiration logic is triggered and 
rollbackContainerResource is invoked. In this case the resource Delta becomes 
positive even in decrease case, but some code is assuming the decrease to be 
negative, which may cause resource accounting wrong ?
{code}
// Delta capacity is negative when it's a decrease request
Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity());
{code}
-  I have a question about the API semantics for the above mentioned scenario. 
According to the AMRMClient#requestContainerResourceChange API, the previous 
pending resource-change-request should be cancelled.  Essentially, the 
semantics is a setter API.  In that sense, the previous 8G should be cancelled. 
With this approach, both resource-change-requests are cancelled. That is,10 min 
later after the expiration is triggered, user will suddenly see its container 
decreased to 2 GB. will this confuse the user ?
- revert  RMContainerChangeResourceEvent



> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-14 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057134#comment-15057134
 ] 

Xianyin Xin commented on YARN-4415:
---

Hi [~leftnoteasy], thanks for you comments.
{quote}
You can see that there're different pros and cons to choose default values of 
the two options. Frankly I don't have strong preference for all these choices. 
But since we have decided default values since 2.6, I would suggest don't 
change the default values.
{quote}
i understand and respect your choice. The pros and cons are just the two sides 
of a coin, we must choose one. But i just feel it strange that the 
access-labels are "\*" but in fact we can't access it. so in this case "\*" 
means nothing except that it is just a symbol, or a abbreviation of all labels. 
(what i mean is it has something contradiction with intuition when one sees 
"*", i think naga has the same sense). You can claim that the access-labels and 
max-capacities are two things and if we want to use it, we must set the two 
separately and explicitly. If we finally choose such the way it works, i will 
reserve my opinion. At last, thanks again. :)

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057201#comment-15057201
 ] 

Sunil G commented on YARN-4418:
---

Thanks you very much  [~leftnoteasy] for the review and commit. 

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4340) Add "list" API to reservation system

2015-12-14 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4340:
--
Attachment: YARN-4340.v9.patch

Thanks Subru for the comments. I took your comments, and added it to this 
latest patch. After doing that, I noticed that I am not able to replicate the 
incorrect behavior that you mentioned.

> Add "list" API to reservation system
> 
>
> Key: YARN-4340
> URL: https://issues.apache.org/jira/browse/YARN-4340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
> Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, 
> YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, 
> YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch, YARN-4340.v9.patch
>
>
> This JIRA tracks changes to the APIs of the reservation system, and enables 
> querying the reservation system on which reservation exists by "time-range, 
> reservation-id".
> YARN-4420 has a dependency on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-14 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057229#comment-15057229
 ] 

sandflee commented on YARN-4138:


got it, thanks for your explain!

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057144#comment-15057144
 ] 

Eric Payne commented on YARN-4225:
--

bq. Could you check findbugs warning in latest Jenkins run is related or not? 
There's no link to findbugs result in latest Jenkins report, so I guess it's 
not related.
[~leftnoteasy], is there something wrong with this build? I can get to 
https://builds.apache.org/job/PreCommit-YARN-Build/9968, but many of the other 
links work in the comment above. For example, 
https://builds.apache.org/job/PreCommit-YARN-Build/9968/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt
 gets 404. I tried to get to the artifacts page, but that comes up 404 also.

I didn't find any findbugs report.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057106#comment-15057106
 ] 

Jian He commented on YARN-4138:
---

{code}
SchedContainerChangeRequest decreaseRequest =
new SchedContainerChangeRequest(
schedulerNode, rmContainer,
rmContainer.getLastConfirmedResource());
decreaseContainer(decreaseRequest,
getCurrentAttemptForContainer(containerId));
{code}
- this scenario may cause resource accounting wrong,  correct me if I'm wrong:
1) AM asks increase 2G -> 8G
2) AM does not increase the container, and asks decrease to 1G
3) LastConfirmedResource becomes 1G
4) In the meantime, containerIncreaseExpiration logic is triggered and 
rollbackContainerResource is invoked. In this case the resource Delta becomes 
positive even in decrease case, but some code is assuming the decrease to be 
negative, which may cause resource accounting wrong ?
{code}
// Delta capacity is negative when it's a decrease request
Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity());
{code}
-  I have a question about the API semantics for the above mentioned scenario. 
According to the AMRMClient#requestContainerResourceChange API, the previous 
pending resource-change-request should be cancelled.  Essentially, the 
semantics is a setter API.  In that sense, the previous 8G should be cancelled. 
With this approach, both resource-change-requests are cancelled. That is,10 min 
later after the expiration is triggered, user will suddenly see its container 
decreased to 2 GB. will this confuse the user ?
- revert  RMContainerChangeResourceEvent



> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events

2015-12-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057190#comment-15057190
 ] 

Sangjin Lee commented on YARN-4247:
---

FYI, those who port YARN-2005 without YARN-3361 will run into this issue pretty 
easily. If we ever decide to backport YARN-2005 to 2.6.x or 2.7.x, YARN-3361 
needs to be backported too or this should be fixed in the way this patch 
suggests.

There are a couple of things that are not quite correct with the patch.
- the call to {{hasMasterContainer()}} in {{ScheduledApplicationAttempt}} is 
opposite: it should be {{!hasMasterContainer()}}
- {{masterContainer}} should be {{volatile}} to preserve the memory visibility

Adding these comments for posterity.

> Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing 
> events
> -
>
> Key: YARN-4247
> URL: https://issues.apache.org/jira/browse/YARN-4247
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-4247.001.patch, YARN-4247.001.patch
>
>
> We see this deadlock in our testing where events do not get processed and we 
> see this in the logs before the RM dies of OOM {noformat} 2015-10-08 
> 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of 
> event-queue is 1488000 2015-10-08 04:48:01,918 INFO 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-14 Thread Mohammad Shahid Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055734#comment-15055734
 ] 

Mohammad Shahid Khan commented on YARN-4441:


Hi [#Varun Vasudev] why to send the kill request if the application is already 
finished?

Please check the ApplicationCLI killApplication API , we have similar check 
before invoking the RPC call to  kill the app.

{Code}
 if (appReport.getYarnApplicationState() == YarnApplicationState.FINISHED
|| appReport.getYarnApplicationState() == YarnApplicationState.KILLED
|| appReport.getYarnApplicationState() == YarnApplicationState.FAILED) {
  sysout.println("Application " + applicationId + " has already finished ");
} else {
  sysout.println("Killing application " + applicationId);
  client.killApplication(appId);
}
{Code}

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055747#comment-15055747
 ] 

Varun Vasudev commented on YARN-4441:
-

That's a performance optimization for the RPC client - it avoids a RPC round 
trip. There's nothing stopping you from writing a YARN client without that 
check. The equivalent to the CLI code you posted would be to grey out the 
button on the web UI if the application is finished(which is a patch I'd be ok 
with).

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs

2015-12-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055676#comment-15055676
 ] 

Varun Vasudev commented on YARN-4451:
-

Agree with points (1) and (2) and (5).

Disagree with point (3). One of the reasons we used the same name is to avoid 
filling up the disk and having to manage the disk space.

With regards to point (4) - I didn't test the feature to make sure it works 
with FairScheduler - have you checked that it generates the logs correctly?

> Some improvements required in Dump scheduler logs
> -
>
> Key: YARN-4451
> URL: https://issues.apache.org/jira/browse/YARN-4451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Though dumping scheduler logs is very useful option there are few nits in 
> using it
> * for naive or first time user its hard to understand what does {{"Time"}} 
> stand for past or future, IMO it would be slightly better to set the name in 
> the ui as {{"Time Period"}}
> * success message should give where the logs will be found and the file name
> * Need to append the time stamp and the period to the file name, so that its 
> not over ridden
> * From code it seems like it always returns {{"Capacity scheduler logs are 
> being created"}} even though the fair scheduler is set
> * Would having cli option in {{"yarn rmadmin"}}  will also be helpful ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4452) NPE when submit Unmanaged application

2015-12-14 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4452:
---

 Summary: NPE when submit Unmanaged application
 Key: YARN-4452
 URL: https://issues.apache.org/jira/browse/YARN-4452
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Critical


As reported in the forum by Wen Lin (w...@pivotal.io)
{quote}
[gpadmin@master simple-yarn-app]$ hadoop jar
~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
{quote}
error is coming as 
{code}
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type REGISTERED for applicationAttempt
application_1450079798629_0001
664 java.lang.NullPointerException
665 at
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
666 at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
667 at
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
668 at
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs

2015-12-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055722#comment-15055722
 ] 

Varun Vasudev commented on YARN-4451:
-

bq .Well i guessed that would be the reason but the problem is it overwrites 
without any warning/alert ! So either we need to provide a message that file 
will be overwritten or may be while creating we can keep configurable number of 
logs, say 5.

Can you do a sizing test to see how big the log is?

bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting 
WebApplicationException just wanted to confirm whether i missing something !

What http method are you using - that url only supports POST.

> Some improvements required in Dump scheduler logs
> -
>
> Key: YARN-4451
> URL: https://issues.apache.org/jira/browse/YARN-4451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Though dumping scheduler logs is very useful option there are few nits in 
> using it
> * for naive or first time user its hard to understand what does {{"Time"}} 
> stand for past or future, IMO it would be slightly better to set the name in 
> the ui as {{"Time Period"}}
> * success message should give where the logs will be found and the file name
> * Need to append the time stamp and the period to the file name, so that its 
> not over ridden
> * From code it seems like it always returns {{"Capacity scheduler logs are 
> being created"}} even though the fair scheduler is set
> * Would having cli option in {{"yarn rmadmin"}}  will also be helpful ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4451) Some improvements required in Dump scheduler logs

2015-12-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055722#comment-15055722
 ] 

Varun Vasudev edited comment on YARN-4451 at 12/14/15 9:58 AM:
---

bq. Well i guessed that would be the reason but the problem is it overwrites 
without any warning/alert ! So either we need to provide a message that file 
will be overwritten or may be while creating we can keep configurable number of 
logs, say 5.

Can you do a sizing test to see how big the log is?

bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting 
WebApplicationException just wanted to confirm whether i missing something !

What http method are you using - that url only supports POST.


was (Author: vvasudev):
bq .Well i guessed that would be the reason but the problem is it overwrites 
without any warning/alert ! So either we need to provide a message that file 
will be overwritten or may be while creating we can keep configurable number of 
logs, say 5.

Can you do a sizing test to see how big the log is?

bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting 
WebApplicationException just wanted to confirm whether i missing something !

What http method are you using - that url only supports POST.

> Some improvements required in Dump scheduler logs
> -
>
> Key: YARN-4451
> URL: https://issues.apache.org/jira/browse/YARN-4451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Though dumping scheduler logs is very useful option there are few nits in 
> using it
> * for naive or first time user its hard to understand what does {{"Time"}} 
> stand for past or future, IMO it would be slightly better to set the name in 
> the ui as {{"Time Period"}}
> * success message should give where the logs will be found and the file name
> * Need to append the time stamp and the period to the file name, so that its 
> not over ridden
> * From code it seems like it always returns {{"Capacity scheduler logs are 
> being created"}} even though the fair scheduler is set
> * Would having cli option in {{"yarn rmadmin"}}  will also be helpful ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055713#comment-15055713
 ] 

Naganarasimha G R commented on YARN-4451:
-

Hi [~vvasudev], Thanks for the feeback, 
bq.  One of the reasons we used the same name is to avoid filling up the disk 
and having to manage the disk space.
Well i guessed that would be the reason but the problem is it overwrites 
without any warning/alert ! So either we need to provide a message that file 
will be overwritten or may be while creating we can keep configurable number of 
logs, say 5.

bq.  I didn't test the feature to make sure it works with FairScheduler - have 
you checked that it generates the logs correctly?
neither did i test just was checking the patch in the jira to see if it 
supports any CLI and found this log/ return message.

Also we observed that DEBUG logs were only present and *not* the INFO logs and 
usually we do not put debug logs for the same info log message, so was 
wondering whether its feasible to collect INFO logs too in the same log file so 
that analysis is faster ? 

and one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting 
WebApplicationException just wanted to confirm whether i missing something !



> Some improvements required in Dump scheduler logs
> -
>
> Key: YARN-4451
> URL: https://issues.apache.org/jira/browse/YARN-4451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Though dumping scheduler logs is very useful option there are few nits in 
> using it
> * for naive or first time user its hard to understand what does {{"Time"}} 
> stand for past or future, IMO it would be slightly better to set the name in 
> the ui as {{"Time Period"}}
> * success message should give where the logs will be found and the file name
> * Need to append the time stamp and the period to the file name, so that its 
> not over ridden
> * From code it seems like it always returns {{"Capacity scheduler logs are 
> being created"}} even though the fair scheduler is set
> * Would having cli option in {{"yarn rmadmin"}}  will also be helpful ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-14 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-4324:
---
Attachment: yarn-nodemanager-dumpam.log

> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
> Attachments: yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2015-12-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3857:

Fix Version/s: 2.6.4

> Memory leak in ResourceManager with SIMPLE mode
> ---
>
> Key: YARN-3857
> URL: https://issues.apache.org/jira/browse/YARN-3857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: mujunchao
>Assignee: mujunchao
>Priority: Critical
>  Labels: patch
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, 
> YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch
>
>
>  We register the ClientTokenMasterKey to avoid client may hold an invalid 
> ClientToken after RM restarts. In SIMPLE mode, we register 
> Pair ,  But we never remove it from HashMap, as 
> unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode

2015-12-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057421#comment-15057421
 ] 

zhihai xu commented on YARN-3857:
-

Yes, this issue exists in 2.6.x, I just committed this patch to branch-2.6.

> Memory leak in ResourceManager with SIMPLE mode
> ---
>
> Key: YARN-3857
> URL: https://issues.apache.org/jira/browse/YARN-3857
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: mujunchao
>Assignee: mujunchao
>Priority: Critical
>  Labels: patch
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, 
> YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch
>
>
>  We register the ClientTokenMasterKey to avoid client may hold an invalid 
> ClientToken after RM restarts. In SIMPLE mode, we register 
> Pair ,  But we never remove it from HashMap, as 
> unregister only runing while in Security mode, so memory leak coming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED

2015-12-14 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3535:

Fix Version/s: 2.6.4

> Scheduler must re-request container resources when RMContainer transitions 
> from ALLOCATED to KILLED
> ---
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
> YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED

2015-12-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057536#comment-15057536
 ] 

zhihai xu commented on YARN-3535:
-

Yes, this issue exists in 2.6.x, I just committed this patch to branch-2.6.

> Scheduler must re-request container resources when RMContainer transitions 
> from ALLOCATED to KILLED
> ---
>
> Key: YARN-3535
> URL: https://issues.apache.org/jira/browse/YARN-3535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Peng Zhang
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, 
> 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, 
> YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-12-14 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056084#comment-15056084
 ] 

Lars Francke commented on YARN-1051:


Is there any documentation on this beside the design doc and the patch itself?

I still have trouble fully understanding how this is implemented/used.

> YARN Admission Control/Planner: enhancing the resource allocation model with 
> time.
> --
>
> Key: YARN-1051
> URL: https://issues.apache.org/jira/browse/YARN-1051
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
> YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
> techreport.pdf
>
>
> In this umbrella JIRA we propose to extend the YARN RM to handle time 
> explicitly, allowing users to "reserve" capacity over time. This is an 
> important step towards SLAs, long-running services, workflows, and helps for 
> gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055837#comment-15055837
 ] 

Naganarasimha G R commented on YARN-4451:
-

bq. Can you do a sizing test to see how big the log is?
It took around 3MB for 5 mins in 7 node cluster with one app, (and for 3 node 
cluster also more or less the same with single app)

bq. What http method are you using - that url only supports POST.
My Mistake. i tried with get . Also if time is not passed do we need to 
consider default as 1 min ? 

How abt other question :
Also we observed that DEBUG logs were only present and not the INFO logs and 
usually we do not put debug logs for the same info log message, so was 
wondering whether its feasible to collect INFO logs too in the same log file so 
that analysis is faster ?

> Some improvements required in Dump scheduler logs
> -
>
> Key: YARN-4451
> URL: https://issues.apache.org/jira/browse/YARN-4451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Though dumping scheduler logs is very useful option there are few nits in 
> using it
> * for naive or first time user its hard to understand what does {{"Time"}} 
> stand for past or future, IMO it would be slightly better to set the name in 
> the ui as {{"Time Period"}}
> * success message should give where the logs will be found and the file name
> * Need to append the time stamp and the period to the file name, so that its 
> not over ridden
> * From code it seems like it always returns {{"Capacity scheduler logs are 
> being created"}} even though the fair scheduler is set
> * Would having cli option in {{"yarn rmadmin"}}  will also be helpful ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4350) TestDistributedShell fails

2015-12-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056150#comment-15056150
 ] 

Varun Saxena commented on YARN-4350:


Discussed offline with Naga.
He will cherry pick YARN-4392 into this branch first.
I will commit it afterwards.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.001.patch, 
> YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056099#comment-15056099
 ] 

Sunil G commented on YARN-3226:
---

Hi [~rohithsharma]
Thanks for pointing out the same.

{{updateMetricsForGracefulDecommission}} is the new generic method which will 
handled what {{updateMetricsForGracefulDecommissionOnUnhealthyNode}} is doing. 
Hence this method is not used.

I will remove the same as its no longer needed. Will update a patch now. Is 
this ok?

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4414:
---
Attachment: YARN-4414.1.2.patch

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056105#comment-15056105
 ] 

Rohith Sharma K S commented on YARN-3226:
-

If unused, it is better to remove it else it becomes stale in the code. Thanks 
for the clarification:-)

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056129#comment-15056129
 ] 

Sunil G commented on YARN-4293:
---

Test failures are not related.
I have verified locally ,and seen that few are failing w/o this patch also. I 
have raised ticket for same . I think change is impacting more test suits and 
hence getting these time outs.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056200#comment-15056200
 ] 

Varun Saxena commented on YARN-4224:


So I looked at [~leftnoteasy]'s code at YARN-3368. I see that for single record 
like a single app attempt, we are extending urlForFindRecord and that takes 
only a single string id as input instead of an object as is the case with 
urlForQuery. In case of app attempt and containers, we can get both appid from 
app attempt id, and app attempt from container so a single id would do.
In our case no such relationship exists between cluster, user, flow, etc. Is 
this why we need UID ? And we want to fetch it from server side so that UID 
encoding can be easily changed in future ? Is my understanding correct ?

By the way what are implications of calling query instead of findRecord ? I 
guess multiple fields can be passed when we call urlForQuery.

Moreover, what do you mean by batch query ? Does that mean support for multiple 
optional query parameters like filters etc. to trim down the results ? We 
already have them.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4454) NM to nodelabel mapping going wrong after RM restart

2015-12-14 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4454:
--

 Summary: NM to nodelabel mapping going wrong after RM restart
 Key: YARN-4454
 URL: https://issues.apache.org/jira/browse/YARN-4454
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


*Steps to reproduce*

1.Create cluster with 2 NM
2.Add label X,Y to cluster
3.replace  Label of node  1 using ,x
4.replace label for node 1 by ,y
5.Again replace label of node 1 by ,x

Check cluster label mapping HOSTNAME1 will be mapped with X 

Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y

{noformat}
2015-12-14 17:17:54,901 INFO 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: 

[jira] [Updated] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3226:
--
Attachment: 0005-YARN-3226.patch

Attaching update patch addressing the comments from [~rohithsharma]

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-14 Thread tangshangwen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057380#comment-15057380
 ] 

tangshangwen commented on YARN-4324:


Because the job failure is random, i dump the am jstack and pstack when am from 
RUNING to KILLING event, I upload my log

> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-14 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-4324:
---
Attachment: logs.rar

I upload the new jstack and am logs

> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
> Attachments: logs.rar, yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057337#comment-15057337
 ] 

Hudson commented on YARN-4403:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/])
YARN-4403. (AM/NM/Container)LivelinessMonitor should use monotonic time 
(jianhe: rev 1cb3299b48a06a842aa3f6cf37ccf44a49af43b5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/SystemClock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/MonotonicClock.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NMLivelinessMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/ContainerAllocationExpirer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java


> (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating 
> period
> 
>
> Key: YARN-4403
> URL: https://issues.apache.org/jira/browse/YARN-4403
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-4403-v2.patch, YARN-4403.patch
>
>
> Currently, (AM/NM/Container)LivelinessMonitor use current system time to 
> calculate a duration of expire which could be broken by settimeofday. We 
> should use Time.monotonicNow() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4402) TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057339#comment-15057339
 ] 

Hudson commented on YARN-4402:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/])
YARN-4402. TestNodeManagerShutdown And TestNodeManagerResync fails with 
(jianhe: rev 915cd6c3f43f32b3ee13aceee68b5e86455e79f2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java


> TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception
> ---
>
> Key: YARN-4402
> URL: https://issues.apache.org/jira/browse/YARN-4402
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4402.patch
>
>
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1465/testReport/
> {noformat}
> 2015-12-01 04:56:07,150 INFO  [main] http.HttpServer2 
> (HttpServer2.java:start(846)) - HttpServer.start() threw a non Bind 
> IOException
> java.net.BindException: Port in use: 0.0.0.0:8042
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:906)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:843)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.serviceStart(WebServer.java:73)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:368)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testContainerPreservationOnResyncImpl(TestNodeManagerResync.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testKillContainersOnResync(TestNodeManagerResync.java:141)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057336#comment-15057336
 ] 

Hudson commented on YARN-4439:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/])
YARN-4439. Clarify NMContainerStatus#toString method. Contributed by (xgong: 
rev d8a45425eba372cdebef3be50436b6ddf1c4e192)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java


> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.7.3
>
> Attachments: YARN-4439.1.patch, YARN-4439.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057412#comment-15057412
 ] 

Rohith Sharma K S commented on YARN-4324:
-

Thanks for the jstack report!! would you provide AM and RM logs?


> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
> Attachments: yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-14 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057501#comment-15057501
 ] 

Jun Gong commented on YARN-3480:


[~jianhe] Thanks for review and suggestion.

{quote}
how about removing the attempts that are beyond the max-allowed-attempts 
instead of the ones beyond the validity interval ? this way, we can keep more 
reasonable amount of history.
{quote}
OK. In earlier patches, I did it in this way.  Then max-allowed-attempts will 
be a global hard limit.

{quote}
Instead of introducing the dummyAttempt in the RMApp, we can change the caller 
to always find the current attempt for container by using 
AbstractYarnScheduler#getCurrentAttemptForContainer API. This way, the 
container events can be routed to the current attempts instead of old one.
{quote}
Current attempt might be in any state, it could not deal with some container 
state, e.g. when attempt is in RMAppAttemptState.NEW, it could deal with event 
RMAppAttemptEventType.CONTAINER_FINISHED. In order not to make attempt's state 
transition more complex, we introduce 'dummyAttempt', it is in final 
state(because it is a finished attempt), e.g. RMAppAttemptState.FAILED, and it 
could deal with any event RMAppAttemptEventType.*. Is it OK?

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, YARN-3480.06.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4402) TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception

2015-12-14 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057257#comment-15057257
 ] 

Brahma Reddy Battula commented on YARN-4402:


thanks a lot [~jianhe] for review and commit.

> TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception
> ---
>
> Key: YARN-4402
> URL: https://issues.apache.org/jira/browse/YARN-4402
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: YARN-4402.patch
>
>
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1465/testReport/
> {noformat}
> 2015-12-01 04:56:07,150 INFO  [main] http.HttpServer2 
> (HttpServer2.java:start(846)) - HttpServer.start() threw a non Bind 
> IOException
> java.net.BindException: Port in use: 0.0.0.0:8042
>   at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:906)
>   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:843)
>   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.serviceStart(WebServer.java:73)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:368)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testContainerPreservationOnResyncImpl(TestNodeManagerResync.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testKillContainersOnResync(TestNodeManagerResync.java:141)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4453) TestMiniYarnClusterNodeUtilization occasionally times out in trunk

2015-12-14 Thread Sunil G (JIRA)
Sunil G created YARN-4453:
-

 Summary: TestMiniYarnClusterNodeUtilization occasionally times out 
in trunk
 Key: YARN-4453
 URL: https://issues.apache.org/jira/browse/YARN-4453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Sunil G


TestMiniYarnClusterNodeUtilization failures are observed in few test runs in 
YARN-4293. 
In local also, same test case is timing out.
{noformat}
java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:158)
at com.sun.proxy.$Proxy85.nodeHeartbeat(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:113)
{noformat}

YARN-3980, where this test are added, reported few timed-out cases. I think 
this is to be investigated because its not looks good to increase timeout for 
tests, if tests fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-14 Thread Lin Wen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056030#comment-15056030
 ] 

Lin Wen commented on YARN-4452:
---

I can see below information in log in Yarn's log file:
2015-12-10 02:52:19,025 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Storing attempt: AppId: application_1449744734026_0001 AttemptId: 
appattempt_1449744734026_0001_01 MasterContainer: null
...
2015-12-10 02:52:19,946 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type REGISTERED for applicationAttempt 
application_1449744734026_0001
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:145)

I guess since there is no container allocated for "unmanaged" application 
master, so MasterContainer is null. But when Yarn register this application 
into SystemMetricsPublisher, it requires a container and its id. That's why 
this null exception happens.
 private void storeAttempt() {
// store attempt data in a non-blocking manner to prevent dispatcher
// thread starvation and wait for state to be saved
LOG.info("Storing attempt: AppId: " + 
  getAppAttemptId().getApplicationId() 
  + " AttemptId: " + 
  getAppAttemptId()
  + " MasterContainer: " + masterContainer);
rmContext.getStateStore().storeNewApplicationAttempt(this);
  }

  public void appAttemptRegistered(RMAppAttempt appAttempt,
  long registeredTime) {
if (publishSystemMetrics) {
  dispatcher.getEventHandler().handle(
  new AppAttemptRegisteredEvent(
  appAttempt.getAppAttemptId(),
  appAttempt.getHost(),
  appAttempt.getRpcPort(),
  appAttempt.getTrackingUrl(),
  appAttempt.getOriginalTrackingUrl(),
  appAttempt.getMasterContainer().getId(),
  registeredTime));
}
  }
In a word, if a unmanaged AM tries to register in Yarn, when timeline server is 
configured and  "yarn.resourcemanager.system-metrics-publisher.enabled" is 
enable, a java NullPointerException occurs in Yarn.

> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056028#comment-15056028
 ] 

Sunil G commented on YARN-3226:
---

Test case failures are known and have separate tickets to handle the same.
[~djp] and [~rohithsharma] pls help to review the same.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-14 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056056#comment-15056056
 ] 

MENG DING commented on YARN-4138:
-

Hi, [~sandflee]

The proposed implementation of the token expiration and resource allocation 
rollback is effectively the same as resource allocation decrease. When the 
resource allocation of a container is decreased in RM, the AM will be notified 
in the next AM-RM heartbeat response. So AM should have a consistent view of 
the resource allocation eventually.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056059#comment-15056059
 ] 

Junping Du commented on YARN-3226:
--

Sure. Take ur time. Thanks Rohith!

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application

2015-12-14 Thread Lin Wen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056023#comment-15056023
 ] 

Lin Wen commented on YARN-4452:
---

Here is how to reproduce it.
1. On Hadoop Yarn, timeline server is started/enabled and 
"yarn.resourcemanager.system-metrics-publisher.enabled" is enable in 
yarn-site.xml.

 The hostname of the timeline server web application.
  yarn.timeline-service.hostname
   master
   

 Enable or disable the GHS
  yarn.resourcemanager.system-metrics-publisher.enabled
   true
   

 Enable or disable the Timeline Server.
  yarn.timeline-service.enabled
   true
   

 Store class name for timeline store
  yarn.timeline-service.store-class
   org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
   

 Store file name for leveldb timeline store
  yarn.timeline-service.leveldb-timeline-store.path
   /data/1/yarn/logs/timeline
   
2. Use hortonworks' 
simple-yarn-app(https://github.com/hortonworks/simple-yarn-app), and start it 
in "unmanaged AM" mode.
hadoop jar 
~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
 Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java 
com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"


> NPE when submit Unmanaged application
> -
>
> Key: YARN-4452
> URL: https://issues.apache.org/jira/browse/YARN-4452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>
> As reported in the forum by Wen Lin (w...@pivotal.io)
> {quote}
> [gpadmin@master simple-yarn-app]$ hadoop jar
> ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar
> Client --classpath  ./target/simple-yarn-app-1.1.0.jar -cmd "java
> com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2"
> {quote}
> error is coming as 
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type REGISTERED for applicationAttempt
> application_1450079798629_0001
> 664 java.lang.NullPointerException
> 665 at
> org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143)
> 666 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365)
> 667 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341)
> 668 at
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056058#comment-15056058
 ] 

Rohith Sharma K S commented on YARN-3226:
-

kindly wait for some time, I will take look at final patch

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056050#comment-15056050
 ] 

Junping Du commented on YARN-3226:
--

+1. 004 patch LGTM. Will commit it shortly if no further feedback from others.

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056068#comment-15056068
 ] 

Rohith Sharma K S commented on YARN-3226:
-

Just one clarification required on keeping still old method 
{{RMNodeImpl#updateMetricsForGracefulDecommissionOnUnhealthyNode}} which is 
unused is intentional? 

Otherwise I am +1 for the patch

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-14 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3816:
-
Attachment: (was: YARN-3816-feature-YARN-2928-v4.1.patch)

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2015-12-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056340#comment-15056340
 ] 

Varun Saxena commented on YARN-4455:


This was something which was discussed earlier. Should we do this for events 
too ?

> Support fetching metrics by time range
> --
>
> Key: YARN-4455
> URL: https://issues.apache.org/jira/browse/YARN-4455
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4450:
--
Attachment: YARN-4450-feature-YARN-2928.01.patch

Patch v.1 posted.

> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
> -
>
> Key: YARN-4450
> URL: https://issues.apache.org/jira/browse/YARN-4450
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4450-feature-YARN-2928.01.patch
>
>
> When I run the unit tests against the current branch, 
> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:
> {noformat}
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>  
> TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
>  class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing 
> in yarn-default.xml
> {noformat}
> The latter failure is caused by YARN-4356 (when we deprecated 
> RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
> caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4455) Support fetching metrics by time range

2015-12-14 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4455:
--

 Summary: Support fetching metrics by time range
 Key: YARN-4455
 URL: https://issues.apache.org/jira/browse/YARN-4455
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Attachment: YARN-4225.005.patch

bq.Patch looks good, could you mark the findbugs warning needs to be skipped?
Thanks a lot, [~leftnoteasy]. Attaching YARN-4225.005.patch with findbugs 
suppressed for {{org.apache.hadoop.yarn.api.records.impl.pb: 
NP_BOOLEAN_RETURN_NULL}}

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056298#comment-15056298
 ] 

Hadoop QA commented on YARN-4414:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 26s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777496/YARN-4414.1.2.patch |
| JIRA Issue | YARN-4414 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  

[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056307#comment-15056307
 ] 

Rohith Sharma K S commented on YARN-3226:
-

+1. 0005 LGTM. pending jenkins..

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, 
> 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, 
> ClusterMetricsOnNodes_UI.png
>
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-12-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056745#comment-15056745
 ] 

zhihai xu commented on YARN-4209:
-

This issue won't affect 2.6.x branch, since RMStateStoreState.FENCED state is 
only added at 2.7.x branch.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch, YARN-4209.branch-2.7.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056762#comment-15056762
 ] 

Wangda Tan commented on YARN-4224:
--

Hi [~varun_saxena],

For your last comment: 
bq. So I looked at Wangda Tan's code at YARN-3368. I see that for single record 
like a single app attempt, we are extending urlForFindRecord and that takes 
only a single string id as input instead of an object as is the case with 
urlForQuery. In case of app attempt and containers, we can get both appid from 
app attempt id, and app attempt from container so a single id would do.
That's the major reason why I asked to support flat namespace in REST API. Yes 
you're correct, front JS library could support multi layer hierarchy REST API, 
but it's very painful. We have to extend JS library to support it, and we need 
to keep context of objects (in your case we need username/cluster-id/flow-id 
when try to get flow related info). This is very painful from my experience on 
writing web UI.

bq. Moreover, what do you mean by batch query ? Does that mean support for 
multiple optional query parameters like filters etc. to trim down the results ? 
We already have them.
Not sure if it is possible to support queries like: give me flows which users 
satisfy a given regex, and begin/end time is from a range? Could you give me an 
example about what does the query look like?

In addition, I'm planning to propose adding flat namespace REST APIs to RM side 
as well (and keep existing REST APIs in RM unchanged for compatibility). For 
example, we should be able to get container with id 
{{/containers/\{container-id\}}} directly, instead of using existing 
hierarchical REST API. My goal is to make RM/ATSv2 have consistent REST API 
view.

Thoughts?


> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056772#comment-15056772
 ] 

Hadoop QA commented on YARN-4416:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 0s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 153m 1s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056649#comment-15056649
 ] 

Hadoop QA commented on YARN-4218:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4218 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777564/YARN-4218.2.patch |
| JIRA Issue | YARN-4218 |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9973/console |


This message was automatically generated.



> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, 
> YARN-4218.2.patch, YARN-4218.patch, YARN-4218.wip.patch, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4438:
--
Attachment: YARN-4438.3.patch

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-14 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056626#comment-15056626
 ] 

Li Lu commented on YARN-4224:
-

bq. Anyways from the GET side which is our immediate use case, if I understand, 
we will get a set of flows and send UID in the same response for later queries ?
Yes, how about putting them as an "otherinfo" so that the front end can get 
this information? 

bq. If we have all the info, cluster, user, flow, etc. can't we create a URL of 
the form /cluster_id/fuser/flow_name ?
Having hierarchical IDs are possible in ember, but in general it's not a common 
practice. On this point, maybe [~leftnoteasy] has comments? 

bq. Even if UID is required what should be the delimiter ? What if flow name 
has the same delimiter for instance. We need to handle it then.
That's something we need to consider if we'd like to pursue this approach. We 
may need to restrict some special characters in our cluster id/user name/flow 
names. 

bq. If we need this format for UI, should we have this REST endpoint in 
addition to our current REST endpoints(based on proposals above) for normal 
flow from clients ?
I'd prefer to have them as the only style of endpoints for timeline v2. Right 
now we need to spend some work to rebuild REST endpoints in this style for AHS 
for the new UI. Right now in ATS v2 we're starting out fresh, therefore we 
don't need to handle the legacy use cases? 

bq. Moreover, what do you mean by batch query ? Does that mean support for 
multiple optional query parameters like filters etc. to trim down the results ? 
We already have them.
Yes. Let's make sure they have the same style as other endpoints (proposed in 
this JIRA) though. I don't think we need much work underneath the wrapper 
layer. 

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4224-YARN-2928.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4390) Consider container request size during CS preemption

2015-12-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-4390.
--
Resolution: Duplicate

Closing this ticket in favor of YARN-4108

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4218) Metric for resource*time that was preempted

2015-12-14 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-4218:
---
Attachment: YARN-4218.2.patch

> Metric for resource*time that was preempted
> ---
>
> Key: YARN-4218
> URL: https://issues.apache.org/jira/browse/YARN-4218
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, 
> YARN-4218.2.patch, YARN-4218.patch, YARN-4218.wip.patch, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> After YARN-415 we have the ability to track the resource*time footprint of a 
> job and preemption metrics shows how many containers were preempted on a job. 
> However we don't have a metric showing the resource*time footprint cost of 
> preemption. In other words, we know how many containers were preempted but we 
> don't have a good measure of how much work was lost as a result of preemption.
> We should add this metric so we can analyze how much work preemption is 
> costing on a grid and better track which jobs were heavily impacted by it. A 
> job that has 100 containers preempted that only lasted a minute each and were 
> very small is going to be less impacted than a job that only lost a single 
> container but that container was huge and had been running for 3 days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications

2015-12-14 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055686#comment-15055686
 ] 

Varun Vasudev commented on YARN-4441:
-

Why? The reason the webservice implementation calls the RPC function is to 
avoid having different logic between the two. If the RPC implementation decides 
to log the call in the audit log then that logic applies to the webservices 
side as well. I agree with [~sunilg] and [~rohithsharma] - this doesn't seem 
like an issue.

> Kill application request from the webservice(ui) is showing success even for 
> the finished applications
> --
>
> Key: YARN-4441
> URL: https://issues.apache.org/jira/browse/YARN-4441
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
>
> If the application is already finished ie either failled, killed, or succeded
> the kill operation should not be logged as success. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4309) Add container launch related debug information to container logs when a container fails

2015-12-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4309:
-
Summary: Add container launch related debug information to container logs 
when a container fails  (was: Add debug information to container logs when a 
container fails)

> Add container launch related debug information to container logs when a 
> container fails
> ---
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056513#comment-15056513
 ] 

Hadoop QA commented on YARN-4225:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 3s 
{color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 50, now 50). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 19s 
{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output 
file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 0s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 36s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK 

[jira] [Comment Edited] (YARN-4309) Add container launch related debug information to container logs when a container fails

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056515#comment-15056515
 ] 

Wangda Tan edited comment on YARN-4309 at 12/14/15 7:19 PM:


Committed to trunk/branch-2. Thanks [~vvasudev] and review from 
[~ste...@apache.org]/[~sidharta-s]/[~aw]/[~jlowe]/[~kasha]!


was (Author: leftnoteasy):
Committed to trunk/branch-2. Thanks [~vvasudev] and review from 
[~ste...@apache.org]/[~sidharta-s]/[~aw]!

> Add container launch related debug information to container logs when a 
> container fails
> ---
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056526#comment-15056526
 ] 

Wangda Tan commented on YARN-4418:
--

Looks good, thanks [~sunilg]. Committing..

> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056524#comment-15056524
 ] 

Hudson commented on YARN-3946:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8962 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8962/])
YARN-3946. Update exact reason as to why a submitted app is in ACCEPTED 
(wangda: rev 6cb0af3c39a5d49cb2f7911ee21363a9542ca2d7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimitsByPartition.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4226) Make capacity scheduler queue's preemption status REST API consistent with GUI

2015-12-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-4226.
--
Resolution: Won't Fix

Since the code works and is only slightly confusing, I am closing this ticket 
as WontFix.

> Make capacity scheduler queue's preemption status REST API consistent with GUI
> --
>
> Key: YARN-4226
> URL: https://issues.apache.org/jira/browse/YARN-4226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>
> In the capacity scheduler GUI, the preemption status has the following form:
> {code}
> Preemption:   disabled
> {code}
> However, the REST API shows the following for the same status:
> {code}
> preemptionDisabled":true
> {code}
> The latter is confusing and should be consistent with the format in the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056533#comment-15056533
 ] 

Naganarasimha G R commented on YARN-3946:
-

Thanks for the revew and commit [~wangda].

> Update exact reason as to why a submitted app is in ACCEPTED state to app's 
> diagnostic message
> --
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Fix For: 2.8.0
>
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056540#comment-15056540
 ] 

Wangda Tan commented on YARN-4225:
--

Thanks [~eepayne] for update.

Could you check findbugs warning in latest Jenkins run is related or not? 
There's no link to findbugs result in latest Jenkins report, so I guess it's 
not related.

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-4225.001.patch, YARN-4225.002.patch, 
> YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056552#comment-15056552
 ] 

Naganarasimha G R commented on YARN-4100:
-

Hi [~dian.fu], [~wangda] & [~devaraj.k],
Can you guys review the latest patch ?

> Add Documentation for Distributed and Delegated-Centralized Node Labels 
> feature
> ---
>
> Key: YARN-4100
> URL: https://issues.apache.org/jira/browse/YARN-4100
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodeLabel.html, YARN-4100.v1.001.patch, 
> YARN-4100.v1.002.patch
>
>
> Add Documentation for Distributed Node Labels feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056568#comment-15056568
 ] 

Hudson commented on YARN-4418:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8963/])
YARN-4418. AM Resource Limit per partition can be updated to (wangda: rev 
07b0fb996a32020678bd2ce482b672f0434651f0)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java


> AM Resource Limit per partition can be updated to ResourceUsage as well
> ---
>
> Key: YARN-4418
> URL: https://issues.apache.org/jira/browse/YARN-4418
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, 
> 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch
>
>
> AMResourceLimit is now extended to all partitions after YARN-3216. Its also 
> better to track this ResourceLimit in existing {{ResourceUsage}} so that REST 
> framework can be benefited to avail this information easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4438:
--
Attachment: YARN-4438.3.patch

Fixed some warnings

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056483#comment-15056483
 ] 

Wangda Tan commented on YARN-1011:
--

Thanks [~kasha], count me in :)! I could help with reviewing/implementation.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056470#comment-15056470
 ] 

Naganarasimha G R commented on YARN-4416:
-

Hi [~wangda], YARN-4416.v2.002.patch removed synchronized lock on 
getNumApplications. 
but i presume there will be possibility that in between  
{{getNumPendingApplications}} and {{getNumActiveApplications}} that 
{{activateApplications}} can be called and number of applications count can be 
given as a wrong value(*more than actual*). Shall i revert for this ?

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056489#comment-15056489
 ] 

Wangda Tan commented on YARN-4416:
--

[~Naganarasimha], I would suggest to revert the change. And delay all 
OrderingPolicy-related changes to other JIRAs.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-14 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4416:

Attachment: YARN-4416.v2.003.patch

Reverting the removal of lock on {{LeafQueue.getNumApplications}}

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Hence we need to ensure following :
> # queueCapacity, resource-usage has their own read/write lock hence 
> synchronization is not req
> # numContainers is volatile hence synchronization is not req.
> # read/write lock could be added to Ordering Policy. Read operations don't 
> need synchronized. So {{getNumApplications}} doesn't need synchronized. 
> (First 2 will be handled in this jira and the third will be handled in 
> YARN-4443)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4309) Add debug information to container logs when a container fails

2015-12-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4309:
-
Summary: Add debug information to container logs when a container fails  
(was: Add debug information to application logs when a container fails)

> Add debug information to container logs when a container fails
> --
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056521#comment-15056521
 ] 

Hudson commented on YARN-4309:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8962 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8962/])
YARN-4309. Add container launch related debug information to container (wangda: 
rev dfcbbddb0963c89c0455d41223427165b9f9e537)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add container launch related debug information to container logs when a 
> container fails
> ---
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, 
> YARN-4309.009.patch, YARN-4309.010.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4438:
--
Attachment: (was: YARN-4438.3.patch)

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056362#comment-15056362
 ] 

Junping Du commented on YARN-3816:
--

Thanks [~sjlee0], [~varun_saxena] and Li's comments. I am rebase the patch with 
YARN-4356 and incorporating your comments above. Some quick response for your 
major comments above for more feedback:
bq. It appears that the current code will aggregate metrics from all types of 
entities to the application. This seems problematic to me. The main goal of 
this aggregation is to roll up metrics from individual containers to the 
application. But just by having the same metric id, any entity can have its 
metric aggregated by this (incorrectly). For example, any arbitrary entity can 
simply declare a metric named "MEMORY". By virtue of that, it would get 
aggregated and added to the application-level value. There can be variations of 
this: for example, the same metrics can be reported by the container entity, 
app attempt entity, and so on. Then the values may be aggregated double or 
triple. I think we should ensure strongly that the aggregation happens only 
along the path of YARN container entities to application to prevent these 
accidental cases.
That sounds a reasonable concern here. I agree that we should get rid of 
metrics get messed up between system metrics and application's metrics. 
However, I think our goal here is not just aggregate/accumulate container 
metrics, but also provide aggregation service to applications' metrics (other 
than MR). Isn't it? If so, may be a better way is to aggregate metrcis along 
not only metric name but also its original entity type (so memory metrics for 
ContainerEntity won't be aggregated against memory metrics from Application 
Entity). [~sjlee0], What do you think?

bq. On a semi-related note, what happens if clients send metrics directly at 
the application entity level? We should expect most framework-specific AMs to 
do that. For example, MR AM already has all the job-level counters, and it can 
(and should) report those job-level counters as metrics at the YARN application 
entity. Is that case handled correctly, or will we end up getting incorrect 
values (double counting) in that situation?
That's why we need the api of toAggregate() in TimelineMetric. For metrics that 
get aggregated already (like MR AM's counter), it should set it to false to get 
rid of double counting. Sounds good?

bq. calculating area under the curve along the time dimension, would it be 
useful by itself? Average based on this area under the curve seems more useful.
Yes. Both overall and average values are useful in different stand point. 
Former value can be used to represent how much resources the application 
actually consume that is very useful in billing cloud service, etc. We can 
extend later to more values if we think it worth. Varun, make sense?

bq. There are 3 types of aggregation basis, but only application aggregation 
has its own entity type. How do we represent the result entity of the other 2 
types?
I don't quite understand what's the question here. Li, are u suggesting we 
should remove application aggregation entity type, add flow/queue aggregation 
entity type or keep them consistent?

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian 

[jira] [Commented] (YARN-3226) UI changes for decommissioning node

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056420#comment-15056420
 ] 

Hadoop QA commented on YARN-3226:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 104, now 105). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 25s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 22s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777484/0005-YARN-3226.patch |
| JIRA 

[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056430#comment-15056430
 ] 

Naganarasimha G R commented on YARN-4183:
-

Hi [~sjlee0], [~djp] & [~xgong], Now that YARN-3623 is in, can we decide on 
this ? Whether we need to introduce another configuration to decide whether 
client delegation tokens are required to be fetched  along with the existing 
configuration (timeline service and security is enabled ) ? or is it sufficient 
that clients can configure {{yarn.timeline-service.client.best-effort}} / 
{{yarn.timeline-service.enabled}} to false 

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056428#comment-15056428
 ] 

Sangjin Lee commented on YARN-4450:
---

Can I use a quick review on this? The changes are very much straightforward.

> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
> -
>
> Key: YARN-4450
> URL: https://issues.apache.org/jira/browse/YARN-4450
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4450-feature-YARN-2928.01.patch
>
>
> When I run the unit tests against the current branch, 
> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:
> {noformat}
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>  
> TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
>  class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing 
> in yarn-default.xml
> {noformat}
> The latter failure is caused by YARN-4356 (when we deprecated 
> RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
> caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements

2015-12-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056432#comment-15056432
 ] 

Carlo Curino commented on YARN-4360:


Rebasing after YARN-4358 got committed. Note: [~imenache] is working on 
YARN-4359 so some of the ugly "instance of" that you see in this patch are 
going to go away (as he moves the LowCostAligned agents forward). 

> Improve GreedyReservationAgent to support "early" allocations, and 
> performance improvements 
> 
>
> Key: YARN-4360
> URL: https://issues.apache.org/jira/browse/YARN-4360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4360.2.patch, YARN-4360.3.patch, YARN-4360.patch
>
>
> The GreedyReservationAgent allocates "as late as possible". Per various 
> conversations, it seems useful to have a mirror behavior that allocates as 
> early as possible. Also in the process we leverage improvements from 
> YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which 
> significantly speeds up allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056450#comment-15056450
 ] 

Li Lu commented on YARN-4450:
-

Patch LGTM. +1. Will commit shortly. 

> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
> -
>
> Key: YARN-4450
> URL: https://issues.apache.org/jira/browse/YARN-4450
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4450-feature-YARN-2928.01.patch
>
>
> When I run the unit tests against the current branch, 
> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:
> {noformat}
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>  
> TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
>  class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing 
> in yarn-default.xml
> {noformat}
> The latter failure is caused by YARN-4356 (when we deprecated 
> RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
> caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056479#comment-15056479
 ] 

Wangda Tan commented on YARN-4293:
--

Thanks [~sunilg],

One other comments:
- InterfaceAudience for ResourceUtilization seems not correct, if 
getResourceUtilization is public in NodeReport, ResourceUtilization should be 
public as well. Is it better to mark all ResourceUtilization related apis to 
public and unstable?

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056399#comment-15056399
 ] 

Hadoop QA commented on YARN-4450:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
18s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 45s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:7c86163 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777508/YARN-4450-feature-YARN-2928.01.patch
 |
| JIRA Issue | YARN-4450 |
| 

[jira] [Commented] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels

2015-12-14 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056449#comment-15056449
 ] 

Carlo Curino commented on YARN-4194:


[~atumanov] thanks for contributing this. In general, the patch looks good to 
me. One nit is that we extend the language without extending 
{{ReservationInputValidator.validateReservationDefinition(..)}} accordingly. 
Are you planning to add that in one of the other JIRAs under YARN-4193 
umbrella, or should we have it as part of this JIRA?

> Extend Reservation Definition Langauge (RDL) extensions to support node labels
> --
>
> Key: YARN-4194
> URL: https://issues.apache.org/jira/browse/YARN-4194
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Alexey Tumanov
> Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch
>
>
> This JIRA tracks changes to the APIs to the reservation system to support
> the expressivity of node-labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056452#comment-15056452
 ] 

Naganarasimha G R commented on YARN-4450:
-

[~sjlee0],
tested locally both the cases seems to pass after applying the patch, but as 
you mentioned the later one is for fix in trunk so do we need to put a patch in 
trunk ?

> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
> -
>
> Key: YARN-4450
> URL: https://issues.apache.org/jira/browse/YARN-4450
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4450-feature-YARN-2928.01.patch
>
>
> When I run the unit tests against the current branch, 
> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:
> {noformat}
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>  
> TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
>  class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing 
> in yarn-default.xml
> {noformat}
> The latter failure is caused by YARN-4356 (when we deprecated 
> RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
> caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056458#comment-15056458
 ] 

Sangjin Lee commented on YARN-4450:
---

No, it is not an issue with the trunk. What's done in the trunk is correct. 
When we rebased our feature branch with the trunk, we fail to modify the trunk 
change according to the change we're making (not using resURI). So the issue is 
solely on our branch. Hope that helps.

> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
> -
>
> Key: YARN-4450
> URL: https://issues.apache.org/jira/browse/YARN-4450
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4450-feature-YARN-2928.01.patch
>
>
> When I run the unit tests against the current branch, 
> TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:
> {noformat}
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>   TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
> NullPointer
>  
> TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
>  class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing 
> in yarn-default.xml
> {noformat}
> The latter failure is caused by YARN-4356 (when we deprecated 
> RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
> caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >