[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.
[ https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057006#comment-15057006 ] Hudson commented on YARN-4439: -- FAILURE: Integrated in Hadoop-trunk-Commit #8967 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8967/]) YARN-4439. Clarify NMContainerStatus#toString method. Contributed by (xgong: rev d8a45425eba372cdebef3be50436b6ddf1c4e192) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java > Clarify NMContainerStatus#toString method. > -- > > Key: YARN-4439 > URL: https://issues.apache.org/jira/browse/YARN-4439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.3 > > Attachments: YARN-4439.1.patch, YARN-4439.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057233#comment-15057233 ] Sunil G commented on YARN-3226: --- Test case failures are known and not related to this patch. [~djp], [~rohithsharma] kindly help to check the same. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, > ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.
[ https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057119#comment-15057119 ] Xuan Gong commented on YARN-4439: - Committed into branch-2.8 > Clarify NMContainerStatus#toString method. > -- > > Key: YARN-4439 > URL: https://issues.apache.org/jira/browse/YARN-4439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.3 > > Attachments: YARN-4439.1.patch, YARN-4439.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil
[ https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Paduano updated YARN-4435: -- Attachment: proposed_solution > Add RM Delegation Token DtFetcher Implementation for DtUtil > --- > > Key: YARN-4435 > URL: https://issues.apache.org/jira/browse/YARN-4435 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: proposed_solution > > > Add a class to yarn project that implements the DtFetcher interface to return > a RM delegation token object. > I attached a proposed class implementation that does this, but it cannot be > added as a patch until the interface is merged in HADOOP-12563 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil
[ https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Paduano updated YARN-4435: -- Attachment: (was: proposed_solution) > Add RM Delegation Token DtFetcher Implementation for DtUtil > --- > > Key: YARN-4435 > URL: https://issues.apache.org/jira/browse/YARN-4435 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Matthew Paduano >Assignee: Matthew Paduano > Attachments: proposed_solution > > > Add a class to yarn project that implements the DtFetcher interface to return > a RM delegation token object. > I attached a proposed class implementation that does this, but it cannot be > added as a patch until the interface is merged in HADOOP-12563 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
[ https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057037#comment-15057037 ] Hudson commented on YARN-4418: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/]) YARN-4418. AM Resource Limit per partition can be updated to (wangda: rev 07b0fb996a32020678bd2ce482b672f0434651f0) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java > AM Resource Limit per partition can be updated to ResourceUsage as well > --- > > Key: YARN-4418 > URL: https://issues.apache.org/jira/browse/YARN-4418 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, > 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch > > > AMResourceLimit is now extended to all partitions after YARN-3216. Its also > better to track this ResourceLimit in existing {{ResourceUsage}} so that REST > framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057039#comment-15057039 ] Hudson commented on YARN-3946: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/]) YARN-3946. Update exact reason as to why a submitted app is in ACCEPTED (wangda: rev 6cb0af3c39a5d49cb2f7911ee21363a9542ca2d7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimitsByPartition.java > Update exact reason as to why a submitted app is in ACCEPTED state to app's > diagnostic message > -- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057034#comment-15057034 ] Hudson commented on YARN-4309: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #692 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/692/]) YARN-4309. Add container launch related debug information to container (wangda: rev dfcbbddb0963c89c0455d41223427165b9f9e537) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Add container launch related debug information to container logs when a > container fails > --- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057106#comment-15057106 ] Jian He edited comment on YARN-4138 at 12/15/15 1:06 AM: - {code} SchedContainerChangeRequest decreaseRequest = new SchedContainerChangeRequest( schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); decreaseContainer(decreaseRequest, getCurrentAttemptForContainer(containerId)); {code} - this scenario may cause resource accounting wrong, correct me if I'm wrong: 1) AM asks increase 2G -> 8G 2) AM does not increase the container, and asks decrease to 1G 3) LastConfirmedResource becomes 1G 4) In the meantime, containerIncreaseExpiration logic is triggered and rollbackContainerResource is invoked. In this case the resource Delta becomes positive even in decrease case, but some code is assuming the decrease to be negative, which may cause resource accounting wrong ? {code} // Delta capacity is negative when it's a decrease request Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity()); {code} - I have a question about the API semantics for the above mentioned scenario. According to the AMRMClient#requestContainerResourceChange API, the previous pending resource-change-request should be cancelled. Essentially, the semantics is a setter API. In that sense, the previous 8G should be cancelled. With this approach, both resource-change-requests are cancelled. That is,10 min later after the expiration is triggered, user will suddenly see its container decreased to 2 GB. will this confuse the user ? - revert format only changes in RMContainerChangeResourceEvent was (Author: jianhe): {code} SchedContainerChangeRequest decreaseRequest = new SchedContainerChangeRequest( schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); decreaseContainer(decreaseRequest, getCurrentAttemptForContainer(containerId)); {code} - this scenario may cause resource accounting wrong, correct me if I'm wrong: 1) AM asks increase 2G -> 8G 2) AM does not increase the container, and asks decrease to 1G 3) LastConfirmedResource becomes 1G 4) In the meantime, containerIncreaseExpiration logic is triggered and rollbackContainerResource is invoked. In this case the resource Delta becomes positive even in decrease case, but some code is assuming the decrease to be negative, which may cause resource accounting wrong ? {code} // Delta capacity is negative when it's a decrease request Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity()); {code} - I have a question about the API semantics for the above mentioned scenario. According to the AMRMClient#requestContainerResourceChange API, the previous pending resource-change-request should be cancelled. Essentially, the semantics is a setter API. In that sense, the previous 8G should be cancelled. With this approach, both resource-change-requests are cancelled. That is,10 min later after the expiration is triggered, user will suddenly see its container decreased to 2 GB. will this confuse the user ? - revert RMContainerChangeResourceEvent > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057134#comment-15057134 ] Xianyin Xin commented on YARN-4415: --- Hi [~leftnoteasy], thanks for you comments. {quote} You can see that there're different pros and cons to choose default values of the two options. Frankly I don't have strong preference for all these choices. But since we have decided default values since 2.6, I would suggest don't change the default values. {quote} i understand and respect your choice. The pros and cons are just the two sides of a coin, we must choose one. But i just feel it strange that the access-labels are "\*" but in fact we can't access it. so in this case "\*" means nothing except that it is just a symbol, or a abbreviation of all labels. (what i mean is it has something contradiction with intuition when one sees "*", i think naga has the same sense). You can claim that the access-labels and max-capacities are two things and if we want to use it, we must set the two separately and explicitly. If we finally choose such the way it works, i will reserve my opinion. At last, thanks again. :) > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, > capacity-scheduler.xml, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
[ https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057201#comment-15057201 ] Sunil G commented on YARN-4418: --- Thanks you very much [~leftnoteasy] for the review and commit. > AM Resource Limit per partition can be updated to ResourceUsage as well > --- > > Key: YARN-4418 > URL: https://issues.apache.org/jira/browse/YARN-4418 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, > 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch > > > AMResourceLimit is now extended to all partitions after YARN-3216. Its also > better to track this ResourceLimit in existing {{ResourceUsage}} so that REST > framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v9.patch Thanks Subru for the comments. I took your comments, and added it to this latest patch. After doing that, I noticed that I am not able to replicate the incorrect behavior that you mentioned. > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch, > YARN-4340.v6.patch, YARN-4340.v7.patch, YARN-4340.v8.patch, YARN-4340.v9.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057229#comment-15057229 ] sandflee commented on YARN-4138: got it, thanks for your explain! > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057144#comment-15057144 ] Eric Payne commented on YARN-4225: -- bq. Could you check findbugs warning in latest Jenkins run is related or not? There's no link to findbugs result in latest Jenkins report, so I guess it's not related. [~leftnoteasy], is there something wrong with this build? I can get to https://builds.apache.org/job/PreCommit-YARN-Build/9968, but many of the other links work in the comment above. For example, https://builds.apache.org/job/PreCommit-YARN-Build/9968/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66.txt gets 404. I tried to get to the artifacts page, but that comes up 404 also. I didn't find any findbugs report. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057106#comment-15057106 ] Jian He commented on YARN-4138: --- {code} SchedContainerChangeRequest decreaseRequest = new SchedContainerChangeRequest( schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); decreaseContainer(decreaseRequest, getCurrentAttemptForContainer(containerId)); {code} - this scenario may cause resource accounting wrong, correct me if I'm wrong: 1) AM asks increase 2G -> 8G 2) AM does not increase the container, and asks decrease to 1G 3) LastConfirmedResource becomes 1G 4) In the meantime, containerIncreaseExpiration logic is triggered and rollbackContainerResource is invoked. In this case the resource Delta becomes positive even in decrease case, but some code is assuming the decrease to be negative, which may cause resource accounting wrong ? {code} // Delta capacity is negative when it's a decrease request Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity()); {code} - I have a question about the API semantics for the above mentioned scenario. According to the AMRMClient#requestContainerResourceChange API, the previous pending resource-change-request should be cancelled. Essentially, the semantics is a setter API. In that sense, the previous 8G should be cancelled. With this approach, both resource-change-requests are cancelled. That is,10 min later after the expiration is triggered, user will suddenly see its container decreased to 2 GB. will this confuse the user ? - revert RMContainerChangeResourceEvent > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4247) Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing events
[ https://issues.apache.org/jira/browse/YARN-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057190#comment-15057190 ] Sangjin Lee commented on YARN-4247: --- FYI, those who port YARN-2005 without YARN-3361 will run into this issue pretty easily. If we ever decide to backport YARN-2005 to 2.6.x or 2.7.x, YARN-3361 needs to be backported too or this should be fixed in the way this patch suggests. There are a couple of things that are not quite correct with the patch. - the call to {{hasMasterContainer()}} in {{ScheduledApplicationAttempt}} is opposite: it should be {{!hasMasterContainer()}} - {{masterContainer}} should be {{volatile}} to preserve the memory visibility Adding these comments for posterity. > Deadlock in FSAppAttempt and RMAppAttemptImpl causes RM to stop processing > events > - > > Key: YARN-4247 > URL: https://issues.apache.org/jira/browse/YARN-4247 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-4247.001.patch, YARN-4247.001.patch > > > We see this deadlock in our testing where events do not get processed and we > see this in the logs before the RM dies of OOM {noformat} 2015-10-08 > 04:48:01,918 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size of > event-queue is 1488000 2015-10-08 04:48:01,918 INFO > org.apache.hadoop.yarn.event.AsyncDispatcher: Size of event-queue is 1488000 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055734#comment-15055734 ] Mohammad Shahid Khan commented on YARN-4441: Hi [#Varun Vasudev] why to send the kill request if the application is already finished? Please check the ApplicationCLI killApplication API , we have similar check before invoking the RPC call to kill the app. {Code} if (appReport.getYarnApplicationState() == YarnApplicationState.FINISHED || appReport.getYarnApplicationState() == YarnApplicationState.KILLED || appReport.getYarnApplicationState() == YarnApplicationState.FAILED) { sysout.println("Application " + applicationId + " has already finished "); } else { sysout.println("Killing application " + applicationId); client.killApplication(appId); } {Code} > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055747#comment-15055747 ] Varun Vasudev commented on YARN-4441: - That's a performance optimization for the RPC client - it avoids a RPC round trip. There's nothing stopping you from writing a YARN client without that check. The equivalent to the CLI code you posted would be to grey out the button on the web UI if the application is finished(which is a patch I'd be ok with). > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs
[ https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055676#comment-15055676 ] Varun Vasudev commented on YARN-4451: - Agree with points (1) and (2) and (5). Disagree with point (3). One of the reasons we used the same name is to avoid filling up the disk and having to manage the disk space. With regards to point (4) - I didn't test the feature to make sure it works with FairScheduler - have you checked that it generates the logs correctly? > Some improvements required in Dump scheduler logs > - > > Key: YARN-4451 > URL: https://issues.apache.org/jira/browse/YARN-4451 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Though dumping scheduler logs is very useful option there are few nits in > using it > * for naive or first time user its hard to understand what does {{"Time"}} > stand for past or future, IMO it would be slightly better to set the name in > the ui as {{"Time Period"}} > * success message should give where the logs will be found and the file name > * Need to append the time stamp and the period to the file name, so that its > not over ridden > * From code it seems like it always returns {{"Capacity scheduler logs are > being created"}} even though the fair scheduler is set > * Would having cli option in {{"yarn rmadmin"}} will also be helpful ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4452) NPE when submit Unmanaged application
Naganarasimha G R created YARN-4452: --- Summary: NPE when submit Unmanaged application Key: YARN-4452 URL: https://issues.apache.org/jira/browse/YARN-4452 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Critical As reported in the forum by Wen Lin (w...@pivotal.io) {quote} [gpadmin@master simple-yarn-app]$ hadoop jar ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar Client --classpath ./target/simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2" {quote} error is coming as {code} org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type REGISTERED for applicationAttempt application_1450079798629_0001 664 java.lang.NullPointerException 665 at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143) 666 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365) 667 at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341) 668 at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs
[ https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055722#comment-15055722 ] Varun Vasudev commented on YARN-4451: - bq .Well i guessed that would be the reason but the problem is it overwrites without any warning/alert ! So either we need to provide a message that file will be overwritten or may be while creating we can keep configurable number of logs, say 5. Can you do a sizing test to see how big the log is? bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting WebApplicationException just wanted to confirm whether i missing something ! What http method are you using - that url only supports POST. > Some improvements required in Dump scheduler logs > - > > Key: YARN-4451 > URL: https://issues.apache.org/jira/browse/YARN-4451 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Though dumping scheduler logs is very useful option there are few nits in > using it > * for naive or first time user its hard to understand what does {{"Time"}} > stand for past or future, IMO it would be slightly better to set the name in > the ui as {{"Time Period"}} > * success message should give where the logs will be found and the file name > * Need to append the time stamp and the period to the file name, so that its > not over ridden > * From code it seems like it always returns {{"Capacity scheduler logs are > being created"}} even though the fair scheduler is set > * Would having cli option in {{"yarn rmadmin"}} will also be helpful ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4451) Some improvements required in Dump scheduler logs
[ https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055722#comment-15055722 ] Varun Vasudev edited comment on YARN-4451 at 12/14/15 9:58 AM: --- bq. Well i guessed that would be the reason but the problem is it overwrites without any warning/alert ! So either we need to provide a message that file will be overwritten or may be while creating we can keep configurable number of logs, say 5. Can you do a sizing test to see how big the log is? bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting WebApplicationException just wanted to confirm whether i missing something ! What http method are you using - that url only supports POST. was (Author: vvasudev): bq .Well i guessed that would be the reason but the problem is it overwrites without any warning/alert ! So either we need to provide a message that file will be overwritten or may be while creating we can keep configurable number of logs, say 5. Can you do a sizing test to see how big the log is? bq. one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting WebApplicationException just wanted to confirm whether i missing something ! What http method are you using - that url only supports POST. > Some improvements required in Dump scheduler logs > - > > Key: YARN-4451 > URL: https://issues.apache.org/jira/browse/YARN-4451 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Though dumping scheduler logs is very useful option there are few nits in > using it > * for naive or first time user its hard to understand what does {{"Time"}} > stand for past or future, IMO it would be slightly better to set the name in > the ui as {{"Time Period"}} > * success message should give where the logs will be found and the file name > * Need to append the time stamp and the period to the file name, so that its > not over ridden > * From code it seems like it always returns {{"Capacity scheduler logs are > being created"}} even though the fair scheduler is set > * Would having cli option in {{"yarn rmadmin"}} will also be helpful ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs
[ https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055713#comment-15055713 ] Naganarasimha G R commented on YARN-4451: - Hi [~vvasudev], Thanks for the feeback, bq. One of the reasons we used the same name is to avoid filling up the disk and having to manage the disk space. Well i guessed that would be the reason but the problem is it overwrites without any warning/alert ! So either we need to provide a message that file will be overwritten or may be while creating we can keep configurable number of logs, say 5. bq. I didn't test the feature to make sure it works with FairScheduler - have you checked that it generates the logs correctly? neither did i test just was checking the patch in the jira to see if it supports any CLI and found this log/ return message. Also we observed that DEBUG logs were only present and *not* the INFO logs and usually we do not put debug logs for the same info log message, so was wondering whether its feasible to collect INFO logs too in the same log file so that analysis is faster ? and one more issue is when i tried the REST url "http:///ws/v1/cluster/scheduler/logs" using wget, i am getting WebApplicationException just wanted to confirm whether i missing something ! > Some improvements required in Dump scheduler logs > - > > Key: YARN-4451 > URL: https://issues.apache.org/jira/browse/YARN-4451 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Though dumping scheduler logs is very useful option there are few nits in > using it > * for naive or first time user its hard to understand what does {{"Time"}} > stand for past or future, IMO it would be slightly better to set the name in > the ui as {{"Time Period"}} > * success message should give where the logs will be found and the file name > * Need to append the time stamp and the period to the file name, so that its > not over ridden > * From code it seems like it always returns {{"Capacity scheduler logs are > being created"}} even though the fair scheduler is set > * Would having cli option in {{"yarn rmadmin"}} will also be helpful ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM
[ https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tangshangwen updated YARN-4324: --- Attachment: yarn-nodemanager-dumpam.log > AM hang more than 10 min was kill by RM > --- > > Key: YARN-4324 > URL: https://issues.apache.org/jira/browse/YARN-4324 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: tangshangwen > Attachments: yarn-nodemanager-dumpam.log > > > this is my logs > 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865 > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition > ed from UNASSIGNED to KILLED > 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing > the event EventType: JOB_COMMIT > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a > signal. Signaling RMCommunicator and JobHistoryEventHandler. > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator > notified that iSignalled is: true > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator > isAMLastRetry: true > the hive map run 100% and return map 0% and the job failed! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3857: Fix Version/s: 2.6.4 > Memory leak in ResourceManager with SIMPLE mode > --- > > Key: YARN-3857 > URL: https://issues.apache.org/jira/browse/YARN-3857 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: mujunchao >Assignee: mujunchao >Priority: Critical > Labels: patch > Fix For: 2.7.2, 2.6.4 > > Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, > YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch > > > We register the ClientTokenMasterKey to avoid client may hold an invalid > ClientToken after RM restarts. In SIMPLE mode, we register > Pair, But we never remove it from HashMap, as > unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057421#comment-15057421 ] zhihai xu commented on YARN-3857: - Yes, this issue exists in 2.6.x, I just committed this patch to branch-2.6. > Memory leak in ResourceManager with SIMPLE mode > --- > > Key: YARN-3857 > URL: https://issues.apache.org/jira/browse/YARN-3857 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: mujunchao >Assignee: mujunchao >Priority: Critical > Labels: patch > Fix For: 2.7.2, 2.6.4 > > Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, > YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch > > > We register the ClientTokenMasterKey to avoid client may hold an invalid > ClientToken after RM restarts. In SIMPLE mode, we register > Pair, But we never remove it from HashMap, as > unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3535: Fix Version/s: 2.6.4 > Scheduler must re-request container resources when RMContainer transitions > from ALLOCATED to KILLED > --- > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Fix For: 2.7.2, 2.6.4 > > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, > YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) Scheduler must re-request container resources when RMContainer transitions from ALLOCATED to KILLED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057536#comment-15057536 ] zhihai xu commented on YARN-3535: - Yes, this issue exists in 2.6.x, I just committed this patch to branch-2.6. > Scheduler must re-request container resources when RMContainer transitions > from ALLOCATED to KILLED > --- > > Key: YARN-3535 > URL: https://issues.apache.org/jira/browse/YARN-3535 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Fix For: 2.7.2, 2.6.4 > > Attachments: 0003-YARN-3535.patch, 0004-YARN-3535.patch, > 0005-YARN-3535.patch, 0006-YARN-3535.patch, YARN-3535-001.patch, > YARN-3535-002.patch, syslog.tgz, yarn-app.log > > > During rolling update of NM, AM start of container on NM failed. > And then job hang there. > Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.
[ https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056084#comment-15056084 ] Lars Francke commented on YARN-1051: Is there any documentation on this beside the design doc and the patch itself? I still have trouble fully understanding how this is implemented/used. > YARN Admission Control/Planner: enhancing the resource allocation model with > time. > -- > > Key: YARN-1051 > URL: https://issues.apache.org/jira/browse/YARN-1051 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, > YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, > techreport.pdf > > > In this umbrella JIRA we propose to extend the YARN RM to handle time > explicitly, allowing users to "reserve" capacity over time. This is an > important step towards SLAs, long-running services, workflows, and helps for > gang scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4451) Some improvements required in Dump scheduler logs
[ https://issues.apache.org/jira/browse/YARN-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055837#comment-15055837 ] Naganarasimha G R commented on YARN-4451: - bq. Can you do a sizing test to see how big the log is? It took around 3MB for 5 mins in 7 node cluster with one app, (and for 3 node cluster also more or less the same with single app) bq. What http method are you using - that url only supports POST. My Mistake. i tried with get . Also if time is not passed do we need to consider default as 1 min ? How abt other question : Also we observed that DEBUG logs were only present and not the INFO logs and usually we do not put debug logs for the same info log message, so was wondering whether its feasible to collect INFO logs too in the same log file so that analysis is faster ? > Some improvements required in Dump scheduler logs > - > > Key: YARN-4451 > URL: https://issues.apache.org/jira/browse/YARN-4451 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > > Though dumping scheduler logs is very useful option there are few nits in > using it > * for naive or first time user its hard to understand what does {{"Time"}} > stand for past or future, IMO it would be slightly better to set the name in > the ui as {{"Time Period"}} > * success message should give where the logs will be found and the file name > * Need to append the time stamp and the period to the file name, so that its > not over ridden > * From code it seems like it always returns {{"Capacity scheduler logs are > being created"}} even though the fair scheduler is set > * Would having cli option in {{"yarn rmadmin"}} will also be helpful ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4350) TestDistributedShell fails
[ https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056150#comment-15056150 ] Varun Saxena commented on YARN-4350: Discussed offline with Naga. He will cherry pick YARN-4392 into this branch first. I will commit it afterwards. > TestDistributedShell fails > -- > > Key: YARN-4350 > URL: https://issues.apache.org/jira/browse/YARN-4350 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-4350-feature-YARN-2928.001.patch, > YARN-4350-feature-YARN-2928.002.patch, YARN-4350-feature-YARN-2928.003.patch > > > Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. > There seem to be 2 distinct issues. > (1) testDSShellWithoutDomainV2* tests fail sporadically > These test fail more often than not if tested by themselves: > {noformat} > testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 30.998 sec <<< FAILURE! > java.lang.AssertionError: Application created event should be published > atleast once expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207) > {noformat} > They start happening after YARN-4129. I suspect this might have to do with > some timing issue. > (2) the whole test times out > If you run the whole TestDistributedShell test, it times out without fail. > This may or may not have to do with the port change introduced by YARN-2859 > (just a hunch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056099#comment-15056099 ] Sunil G commented on YARN-3226: --- Hi [~rohithsharma] Thanks for pointing out the same. {{updateMetricsForGracefulDecommission}} is the new generic method which will handled what {{updateMetricsForGracefulDecommissionOnUnhealthyNode}} is doing. Hence this method is not used. I will remove the same as its no longer needed. Will update a patch now. Is this ok? > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4414) Nodemanager connection errors are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4414: --- Attachment: YARN-4414.1.2.patch > Nodemanager connection errors are retried at multiple levels > > > Key: YARN-4414 > URL: https://issues.apache.org/jira/browse/YARN-4414 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Jason Lowe >Assignee: Chang Li > Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, > YARN-4414.1.patch > > > This is related to YARN-3238. Ran into more scenarios where connection > errors are being retried at multiple levels, like NoRouteToHostException. > The fix for YARN-3238 was too specific, and I think we need a more general > solution to catch a wider array of connection errors that can occur to avoid > retrying them both at the RPC layer and at the NM proxy layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056105#comment-15056105 ] Rohith Sharma K S commented on YARN-3226: - If unused, it is better to remove it else it becomes stale in the code. Thanks for the clarification:-) > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056129#comment-15056129 ] Sunil G commented on YARN-4293: --- Test failures are not related. I have verified locally ,and seen that few are failing w/o this patch also. I have raised ticket for same . I think change is impacting more test suits and hence getting these time outs. > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056200#comment-15056200 ] Varun Saxena commented on YARN-4224: So I looked at [~leftnoteasy]'s code at YARN-3368. I see that for single record like a single app attempt, we are extending urlForFindRecord and that takes only a single string id as input instead of an object as is the case with urlForQuery. In case of app attempt and containers, we can get both appid from app attempt id, and app attempt from container so a single id would do. In our case no such relationship exists between cluster, user, flow, etc. Is this why we need UID ? And we want to fetch it from server side so that UID encoding can be easily changed in future ? Is my understanding correct ? By the way what are implications of calling query instead of findRecord ? I guess multiple fields can be passed when we call urlForQuery. Moreover, what do you mean by batch query ? Does that mean support for multiple optional query parameters like filters etc. to trim down the results ? We already have them. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4454) NM to nodelabel mapping going wrong after RM restart
Bibin A Chundatt created YARN-4454: -- Summary: NM to nodelabel mapping going wrong after RM restart Key: YARN-4454 URL: https://issues.apache.org/jira/browse/YARN-4454 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical *Steps to reproduce* 1.Create cluster with 2 NM 2.Add label X,Y to cluster 3.replace Label of node 1 using ,x 4.replace label for node 1 by ,y 5.Again replace label of node 1 by ,x Check cluster label mapping HOSTNAME1 will be mapped with X Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y {noformat} 2015-12-14 17:17:54,901 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels:
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: 0005-YARN-3226.patch Attaching update patch addressing the comments from [~rohithsharma] > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, > ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4324) AM hang more than 10 min was kill by RM
[ https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057380#comment-15057380 ] tangshangwen commented on YARN-4324: Because the job failure is random, i dump the am jstack and pstack when am from RUNING to KILLING event, I upload my log > AM hang more than 10 min was kill by RM > --- > > Key: YARN-4324 > URL: https://issues.apache.org/jira/browse/YARN-4324 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: tangshangwen > > this is my logs > 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865 > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition > ed from UNASSIGNED to KILLED > 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing > the event EventType: JOB_COMMIT > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a > signal. Signaling RMCommunicator and JobHistoryEventHandler. > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator > notified that iSignalled is: true > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator > isAMLastRetry: true > the hive map run 100% and return map 0% and the job failed! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM
[ https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tangshangwen updated YARN-4324: --- Attachment: logs.rar I upload the new jstack and am logs > AM hang more than 10 min was kill by RM > --- > > Key: YARN-4324 > URL: https://issues.apache.org/jira/browse/YARN-4324 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: tangshangwen > Attachments: logs.rar, yarn-nodemanager-dumpam.log > > > this is my logs > 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865 > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition > ed from UNASSIGNED to KILLED > 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing > the event EventType: JOB_COMMIT > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a > signal. Signaling RMCommunicator and JobHistoryEventHandler. > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator > notified that iSignalled is: true > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator > isAMLastRetry: true > the hive map run 100% and return map 0% and the job failed! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period
[ https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057337#comment-15057337 ] Hudson commented on YARN-4403: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/]) YARN-4403. (AM/NM/Container)LivelinessMonitor should use monotonic time (jianhe: rev 1cb3299b48a06a842aa3f6cf37ccf44a49af43b5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/SystemClock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/MonotonicClock.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NMLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/ContainerAllocationExpirer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java > (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating > period > > > Key: YARN-4403 > URL: https://issues.apache.org/jira/browse/YARN-4403 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4403-v2.patch, YARN-4403.patch > > > Currently, (AM/NM/Container)LivelinessMonitor use current system time to > calculate a duration of expire which could be broken by settimeofday. We > should use Time.monotonicNow() instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4402) TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception
[ https://issues.apache.org/jira/browse/YARN-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057339#comment-15057339 ] Hudson commented on YARN-4402: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/]) YARN-4402. TestNodeManagerShutdown And TestNodeManagerResync fails with (jianhe: rev 915cd6c3f43f32b3ee13aceee68b5e86455e79f2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java > TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception > --- > > Key: YARN-4402 > URL: https://issues.apache.org/jira/browse/YARN-4402 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: YARN-4402.patch > > > https://builds.apache.org/job/Hadoop-Yarn-trunk/1465/testReport/ > {noformat} > 2015-12-01 04:56:07,150 INFO [main] http.HttpServer2 > (HttpServer2.java:start(846)) - HttpServer.start() threw a non Bind > IOException > java.net.BindException: Port in use: 0.0.0.0:8042 > at > org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:906) > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:843) > at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.serviceStart(WebServer.java:73) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:368) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testContainerPreservationOnResyncImpl(TestNodeManagerResync.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testKillContainersOnResync(TestNodeManagerResync.java:141) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.
[ https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057336#comment-15057336 ] Hudson commented on YARN-4439: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #693 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/693/]) YARN-4439. Clarify NMContainerStatus#toString method. Contributed by (xgong: rev d8a45425eba372cdebef3be50436b6ddf1c4e192) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java > Clarify NMContainerStatus#toString method. > -- > > Key: YARN-4439 > URL: https://issues.apache.org/jira/browse/YARN-4439 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.7.3 > > Attachments: YARN-4439.1.patch, YARN-4439.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4324) AM hang more than 10 min was kill by RM
[ https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057412#comment-15057412 ] Rohith Sharma K S commented on YARN-4324: - Thanks for the jstack report!! would you provide AM and RM logs? > AM hang more than 10 min was kill by RM > --- > > Key: YARN-4324 > URL: https://issues.apache.org/jira/browse/YARN-4324 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: tangshangwen > Attachments: yarn-nodemanager-dumpam.log > > > this is my logs > 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865 > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING > 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition > ed from UNASSIGNED to KILLED > 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing > the event EventType: JOB_COMMIT > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a > signal. Signaling RMCommunicator and JobHistoryEventHandler. > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator > notified that iSignalled is: true > 2015-11-02 01:24:15,851 INFO [Thread-1] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator > isAMLastRetry: true > the hive map run 100% and return map 0% and the job failed! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057501#comment-15057501 ] Jun Gong commented on YARN-3480: [~jianhe] Thanks for review and suggestion. {quote} how about removing the attempts that are beyond the max-allowed-attempts instead of the ones beyond the validity interval ? this way, we can keep more reasonable amount of history. {quote} OK. In earlier patches, I did it in this way. Then max-allowed-attempts will be a global hard limit. {quote} Instead of introducing the dummyAttempt in the RMApp, we can change the caller to always find the current attempt for container by using AbstractYarnScheduler#getCurrentAttemptForContainer API. This way, the container events can be routed to the current attempts instead of old one. {quote} Current attempt might be in any state, it could not deal with some container state, e.g. when attempt is in RMAppAttemptState.NEW, it could deal with event RMAppAttemptEventType.CONTAINER_FINISHED. In order not to make attempt's state transition more complex, we introduce 'dummyAttempt', it is in final state(because it is a finished attempt), e.g. RMAppAttemptState.FAILED, and it could deal with any event RMAppAttemptEventType.*. Is it OK? > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, YARN-3480.06.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4402) TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception
[ https://issues.apache.org/jira/browse/YARN-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057257#comment-15057257 ] Brahma Reddy Battula commented on YARN-4402: thanks a lot [~jianhe] for review and commit. > TestNodeManagerShutdown And TestNodeManagerResync fails with bind exception > --- > > Key: YARN-4402 > URL: https://issues.apache.org/jira/browse/YARN-4402 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: YARN-4402.patch > > > https://builds.apache.org/job/Hadoop-Yarn-trunk/1465/testReport/ > {noformat} > 2015-12-01 04:56:07,150 INFO [main] http.HttpServer2 > (HttpServer2.java:start(846)) - HttpServer.start() threw a non Bind > IOException > java.net.BindException: Port in use: 0.0.0.0:8042 > at > org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:906) > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:843) > at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer.serviceStart(WebServer.java:73) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:368) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testContainerPreservationOnResyncImpl(TestNodeManagerResync.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync.testKillContainersOnResync(TestNodeManagerResync.java:141) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4453) TestMiniYarnClusterNodeUtilization occasionally times out in trunk
Sunil G created YARN-4453: - Summary: TestMiniYarnClusterNodeUtilization occasionally times out in trunk Key: YARN-4453 URL: https://issues.apache.org/jira/browse/YARN-4453 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Sunil G TestMiniYarnClusterNodeUtilization failures are observed in few test runs in YARN-4293. In local also, same test case is timing out. {noformat} java.lang.Exception: test timed out after 6 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:158) at com.sun.proxy.$Proxy85.nodeHeartbeat(Unknown Source) at org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:113) {noformat} YARN-3980, where this test are added, reported few timed-out cases. I think this is to be investigated because its not looks good to increase timeout for tests, if tests fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application
[ https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056030#comment-15056030 ] Lin Wen commented on YARN-4452: --- I can see below information in log in Yarn's log file: 2015-12-10 02:52:19,025 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1449744734026_0001 AttemptId: appattempt_1449744734026_0001_01 MasterContainer: null ... 2015-12-10 02:52:19,946 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type REGISTERED for applicationAttempt application_1449744734026_0001 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:145) I guess since there is no container allocated for "unmanaged" application master, so MasterContainer is null. But when Yarn register this application into SystemMetricsPublisher, it requires a container and its id. That's why this null exception happens. private void storeAttempt() { // store attempt data in a non-blocking manner to prevent dispatcher // thread starvation and wait for state to be saved LOG.info("Storing attempt: AppId: " + getAppAttemptId().getApplicationId() + " AttemptId: " + getAppAttemptId() + " MasterContainer: " + masterContainer); rmContext.getStateStore().storeNewApplicationAttempt(this); } public void appAttemptRegistered(RMAppAttempt appAttempt, long registeredTime) { if (publishSystemMetrics) { dispatcher.getEventHandler().handle( new AppAttemptRegisteredEvent( appAttempt.getAppAttemptId(), appAttempt.getHost(), appAttempt.getRpcPort(), appAttempt.getTrackingUrl(), appAttempt.getOriginalTrackingUrl(), appAttempt.getMasterContainer().getId(), registeredTime)); } } In a word, if a unmanaged AM tries to register in Yarn, when timeline server is configured and "yarn.resourcemanager.system-metrics-publisher.enabled" is enable, a java NullPointerException occurs in Yarn. > NPE when submit Unmanaged application > - > > Key: YARN-4452 > URL: https://issues.apache.org/jira/browse/YARN-4452 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > > As reported in the forum by Wen Lin (w...@pivotal.io) > {quote} > [gpadmin@master simple-yarn-app]$ hadoop jar > ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar > Client --classpath ./target/simple-yarn-app-1.1.0.jar -cmd "java > com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2" > {quote} > error is coming as > {code} > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type REGISTERED for applicationAttempt > application_1450079798629_0001 > 664 java.lang.NullPointerException > 665 at > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143) > 666 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365) > 667 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341) > 668 at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056028#comment-15056028 ] Sunil G commented on YARN-3226: --- Test case failures are known and have separate tickets to handle the same. [~djp] and [~rohithsharma] pls help to review the same. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056056#comment-15056056 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] The proposed implementation of the token expiration and resource allocation rollback is effectively the same as resource allocation decrease. When the resource allocation of a container is decreased in RM, the AM will be notified in the next AM-RM heartbeat response. So AM should have a consistent view of the resource allocation eventually. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056059#comment-15056059 ] Junping Du commented on YARN-3226: -- Sure. Take ur time. Thanks Rohith! > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4452) NPE when submit Unmanaged application
[ https://issues.apache.org/jira/browse/YARN-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056023#comment-15056023 ] Lin Wen commented on YARN-4452: --- Here is how to reproduce it. 1. On Hadoop Yarn, timeline server is started/enabled and "yarn.resourcemanager.system-metrics-publisher.enabled" is enable in yarn-site.xml. The hostname of the timeline server web application. yarn.timeline-service.hostname master Enable or disable the GHS yarn.resourcemanager.system-metrics-publisher.enabled true Enable or disable the Timeline Server. yarn.timeline-service.enabled true Store class name for timeline store yarn.timeline-service.store-class org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore Store file name for leveldb timeline store yarn.timeline-service.leveldb-timeline-store.path /data/1/yarn/logs/timeline 2. Use hortonworks' simple-yarn-app(https://github.com/hortonworks/simple-yarn-app), and start it in "unmanaged AM" mode. hadoop jar ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar Client --classpath ./target/simple-yarn-app-1.1.0.jar -cmd "java com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2" > NPE when submit Unmanaged application > - > > Key: YARN-4452 > URL: https://issues.apache.org/jira/browse/YARN-4452 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Critical > > As reported in the forum by Wen Lin (w...@pivotal.io) > {quote} > [gpadmin@master simple-yarn-app]$ hadoop jar > ~/hadoop/singlecluster/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.6.0.3.0.0.0-120.jar > Client --classpath ./target/simple-yarn-app-1.1.0.jar -cmd "java > com.hortonworks.simpleyarnapp.ApplicationMaster /bin/date 2" > {quote} > error is coming as > {code} > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type REGISTERED for applicationAttempt > application_1450079798629_0001 > 664 java.lang.NullPointerException > 665 at > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143) > 666 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365) > 667 at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341) > 668 at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056058#comment-15056058 ] Rohith Sharma K S commented on YARN-3226: - kindly wait for some time, I will take look at final patch > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056050#comment-15056050 ] Junping Du commented on YARN-3226: -- +1. 004 patch LGTM. Will commit it shortly if no further feedback from others. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056068#comment-15056068 ] Rohith Sharma K S commented on YARN-3226: - Just one clarification required on keeping still old method {{RMNodeImpl#updateMetricsForGracefulDecommissionOnUnhealthyNode}} which is unused is intentional? Otherwise I am +1 for the patch > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3816: - Attachment: (was: YARN-3816-feature-YARN-2928-v4.1.patch) > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, > YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4455) Support fetching metrics by time range
[ https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056340#comment-15056340 ] Varun Saxena commented on YARN-4455: This was something which was discussed earlier. Should we do this for events too ? > Support fetching metrics by time range > -- > > Key: YARN-4455 > URL: https://issues.apache.org/jira/browse/YARN-4455 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4450: -- Attachment: YARN-4450-feature-YARN-2928.01.patch Patch v.1 posted. > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail > - > > Key: YARN-4450 > URL: https://issues.apache.org/jira/browse/YARN-4450 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4450-feature-YARN-2928.01.patch > > > When I run the unit tests against the current branch, > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail: > {noformat} > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > > TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429 > class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing > in yarn-default.xml > {noformat} > The latter failure is caused by YARN-4356 (when we deprecated > RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was > caused when a later use of field {{resURI}} was added in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4455) Support fetching metrics by time range
Varun Saxena created YARN-4455: -- Summary: Support fetching metrics by time range Key: YARN-4455 URL: https://issues.apache.org/jira/browse/YARN-4455 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-4225: - Attachment: YARN-4225.005.patch bq.Patch looks good, could you mark the findbugs warning needs to be skipped? Thanks a lot, [~leftnoteasy]. Attaching YARN-4225.005.patch with findbugs suppressed for {{org.apache.hadoop.yarn.api.records.impl.pb: NP_BOOLEAN_RETURN_NULL}} > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056298#comment-15056298 ] Hadoop QA commented on YARN-4414: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 26s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12777496/YARN-4414.1.2.patch | | JIRA Issue | YARN-4414 | | Optional Tests | asflicense compile javac javadoc mvninstall
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056307#comment-15056307 ] Rohith Sharma K S commented on YARN-3226: - +1. 0005 LGTM. pending jenkins.. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, 0004-YARN-3226.patch, 0005-YARN-3226.patch, > ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition
[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056745#comment-15056745 ] zhihai xu commented on YARN-4209: - This issue won't affect 2.6.x branch, since RMStateStoreState.FENCED state is only added at 2.7.x branch. > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4209.000.patch, YARN-4209.001.patch, > YARN-4209.002.patch, YARN-4209.branch-2.7.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even after {{notifyStoreOperationFailed}} is called. The only working > case for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056762#comment-15056762 ] Wangda Tan commented on YARN-4224: -- Hi [~varun_saxena], For your last comment: bq. So I looked at Wangda Tan's code at YARN-3368. I see that for single record like a single app attempt, we are extending urlForFindRecord and that takes only a single string id as input instead of an object as is the case with urlForQuery. In case of app attempt and containers, we can get both appid from app attempt id, and app attempt from container so a single id would do. That's the major reason why I asked to support flat namespace in REST API. Yes you're correct, front JS library could support multi layer hierarchy REST API, but it's very painful. We have to extend JS library to support it, and we need to keep context of objects (in your case we need username/cluster-id/flow-id when try to get flow related info). This is very painful from my experience on writing web UI. bq. Moreover, what do you mean by batch query ? Does that mean support for multiple optional query parameters like filters etc. to trim down the results ? We already have them. Not sure if it is possible to support queries like: give me flows which users satisfy a given regex, and begin/end time is from a range? Could you give me an example about what does the query look like? In addition, I'm planning to propose adding flat namespace REST APIs to RM side as well (and keep existing REST APIs in RM unchanged for compatibility). For example, we should be able to get container with id {{/containers/\{container-id\}}} directly, instead of using existing hierarchical REST API. My goal is to make RM/ATSv2 have consistent REST API view. Thoughts? > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056772#comment-15056772 ] Hadoop QA commented on YARN-4416: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 0s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056649#comment-15056649 ] Hadoop QA commented on YARN-4218: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4218 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12777564/YARN-4218.2.patch | | JIRA Issue | YARN-4218 | | Powered by | Apache Yetus 0.1.0 http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9973/console | This message was automatically generated. > Metric for resource*time that was preempted > --- > > Key: YARN-4218 > URL: https://issues.apache.org/jira/browse/YARN-4218 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, > YARN-4218.2.patch, YARN-4218.patch, YARN-4218.wip.patch, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > After YARN-415 we have the ability to track the resource*time footprint of a > job and preemption metrics shows how many containers were preempted on a job. > However we don't have a metric showing the resource*time footprint cost of > preemption. In other words, we know how many containers were preempted but we > don't have a good measure of how much work was lost as a result of preemption. > We should add this metric so we can analyze how much work preemption is > costing on a grid and better track which jobs were heavily impacted by it. A > job that has 100 containers preempted that only lasted a minute each and were > very small is going to be less impacted than a job that only lost a single > container but that container was huge and had been running for 3 days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4438: -- Attachment: YARN-4438.3.patch > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056626#comment-15056626 ] Li Lu commented on YARN-4224: - bq. Anyways from the GET side which is our immediate use case, if I understand, we will get a set of flows and send UID in the same response for later queries ? Yes, how about putting them as an "otherinfo" so that the front end can get this information? bq. If we have all the info, cluster, user, flow, etc. can't we create a URL of the form /cluster_id/fuser/flow_name ? Having hierarchical IDs are possible in ember, but in general it's not a common practice. On this point, maybe [~leftnoteasy] has comments? bq. Even if UID is required what should be the delimiter ? What if flow name has the same delimiter for instance. We need to handle it then. That's something we need to consider if we'd like to pursue this approach. We may need to restrict some special characters in our cluster id/user name/flow names. bq. If we need this format for UI, should we have this REST endpoint in addition to our current REST endpoints(based on proposals above) for normal flow from clients ? I'd prefer to have them as the only style of endpoints for timeline v2. Right now we need to spend some work to rebuild REST endpoints in this style for AHS for the new UI. Right now in ATS v2 we're starting out fresh, therefore we don't need to handle the legacy use cases? bq. Moreover, what do you mean by batch query ? Does that mean support for multiple optional query parameters like filters etc. to trim down the results ? We already have them. Yes. Let's make sure they have the same style as other endpoints (proposed in this JIRA) though. I don't think we need much work underneath the wrapper layer. > Change the ATSv2 reader side REST interface to conform to current REST APIs' > in YARN > > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4224-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne resolved YARN-4390. -- Resolution: Duplicate Closing this ticket in favor of YARN-4108 > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4218: --- Attachment: YARN-4218.2.patch > Metric for resource*time that was preempted > --- > > Key: YARN-4218 > URL: https://issues.apache.org/jira/browse/YARN-4218 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, > YARN-4218.2.patch, YARN-4218.patch, YARN-4218.wip.patch, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > After YARN-415 we have the ability to track the resource*time footprint of a > job and preemption metrics shows how many containers were preempted on a job. > However we don't have a metric showing the resource*time footprint cost of > preemption. In other words, we know how many containers were preempted but we > don't have a good measure of how much work was lost as a result of preemption. > We should add this metric so we can analyze how much work preemption is > costing on a grid and better track which jobs were heavily impacted by it. A > job that has 100 containers preempted that only lasted a minute each and were > very small is going to be less impacted than a job that only lost a single > container but that container was huge and had been running for 3 days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4441) Kill application request from the webservice(ui) is showing success even for the finished applications
[ https://issues.apache.org/jira/browse/YARN-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055686#comment-15055686 ] Varun Vasudev commented on YARN-4441: - Why? The reason the webservice implementation calls the RPC function is to avoid having different logic between the two. If the RPC implementation decides to log the call in the audit log then that logic applies to the webservices side as well. I agree with [~sunilg] and [~rohithsharma] - this doesn't seem like an issue. > Kill application request from the webservice(ui) is showing success even for > the finished applications > -- > > Key: YARN-4441 > URL: https://issues.apache.org/jira/browse/YARN-4441 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Mohammad Shahid Khan >Assignee: Mohammad Shahid Khan > > If the application is already finished ie either failled, killed, or succeded > the kill operation should not be logged as success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4309) Add container launch related debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4309: - Summary: Add container launch related debug information to container logs when a container fails (was: Add debug information to container logs when a container fails) > Add container launch related debug information to container logs when a > container fails > --- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056513#comment-15056513 ] Hadoop QA commented on YARN-4225: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 3s {color} | {color:red} branch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 50, now 50). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 19s {color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn no findbugs output file (hadoop-yarn-project/hadoop-yarn/target/findbugsXml.xml) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 0s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 36s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK
[jira] [Comment Edited] (YARN-4309) Add container launch related debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056515#comment-15056515 ] Wangda Tan edited comment on YARN-4309 at 12/14/15 7:19 PM: Committed to trunk/branch-2. Thanks [~vvasudev] and review from [~ste...@apache.org]/[~sidharta-s]/[~aw]/[~jlowe]/[~kasha]! was (Author: leftnoteasy): Committed to trunk/branch-2. Thanks [~vvasudev] and review from [~ste...@apache.org]/[~sidharta-s]/[~aw]! > Add container launch related debug information to container logs when a > container fails > --- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
[ https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056526#comment-15056526 ] Wangda Tan commented on YARN-4418: -- Looks good, thanks [~sunilg]. Committing.. > AM Resource Limit per partition can be updated to ResourceUsage as well > --- > > Key: YARN-4418 > URL: https://issues.apache.org/jira/browse/YARN-4418 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, > 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch > > > AMResourceLimit is now extended to all partitions after YARN-3216. Its also > better to track this ResourceLimit in existing {{ResourceUsage}} so that REST > framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056524#comment-15056524 ] Hudson commented on YARN-3946: -- FAILURE: Integrated in Hadoop-trunk-Commit #8962 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8962/]) YARN-3946. Update exact reason as to why a submitted app is in ACCEPTED (wangda: rev 6cb0af3c39a5d49cb2f7911ee21363a9542ca2d7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAMContainerLaunchDiagnosticsConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimitsByPartition.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java > Update exact reason as to why a submitted app is in ACCEPTED state to app's > diagnostic message > -- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4226) Make capacity scheduler queue's preemption status REST API consistent with GUI
[ https://issues.apache.org/jira/browse/YARN-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne resolved YARN-4226. -- Resolution: Won't Fix Since the code works and is only slightly confusing, I am closing this ticket as WontFix. > Make capacity scheduler queue's preemption status REST API consistent with GUI > -- > > Key: YARN-4226 > URL: https://issues.apache.org/jira/browse/YARN-4226 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > > In the capacity scheduler GUI, the preemption status has the following form: > {code} > Preemption: disabled > {code} > However, the REST API shows the following for the same status: > {code} > preemptionDisabled":true > {code} > The latter is confusing and should be consistent with the format in the GUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Update exact reason as to why a submitted app is in ACCEPTED state to app's diagnostic message
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056533#comment-15056533 ] Naganarasimha G R commented on YARN-3946: - Thanks for the revew and commit [~wangda]. > Update exact reason as to why a submitted app is in ACCEPTED state to app's > diagnostic message > -- > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, > YARN-3946.v1.007.patch, YARN-3946.v1.008.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056540#comment-15056540 ] Wangda Tan commented on YARN-4225: -- Thanks [~eepayne] for update. Could you check findbugs warning in latest Jenkins run is related or not? There's no link to findbugs result in latest Jenkins report, so I guess it's not related. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch, YARN-4225.004.patch, YARN-4225.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4100) Add Documentation for Distributed and Delegated-Centralized Node Labels feature
[ https://issues.apache.org/jira/browse/YARN-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056552#comment-15056552 ] Naganarasimha G R commented on YARN-4100: - Hi [~dian.fu], [~wangda] & [~devaraj.k], Can you guys review the latest patch ? > Add Documentation for Distributed and Delegated-Centralized Node Labels > feature > --- > > Key: YARN-4100 > URL: https://issues.apache.org/jira/browse/YARN-4100 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: NodeLabel.html, YARN-4100.v1.001.patch, > YARN-4100.v1.002.patch > > > Add Documentation for Distributed Node Labels feature -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
[ https://issues.apache.org/jira/browse/YARN-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056568#comment-15056568 ] Hudson commented on YARN-4418: -- FAILURE: Integrated in Hadoop-trunk-Commit #8963 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8963/]) YARN-4418. AM Resource Limit per partition can be updated to (wangda: rev 07b0fb996a32020678bd2ce482b672f0434651f0) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/QueueCapacities.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestResourceUsage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueCapacities.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java > AM Resource Limit per partition can be updated to ResourceUsage as well > --- > > Key: YARN-4418 > URL: https://issues.apache.org/jira/browse/YARN-4418 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4418.patch, 0002-YARN-4418.patch, > 0003-YARN-4418.patch, 0004-YARN-4418.patch, 0005-YARN-4418.patch > > > AMResourceLimit is now extended to all partitions after YARN-3216. Its also > better to track this ResourceLimit in existing {{ResourceUsage}} so that REST > framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4438: -- Attachment: YARN-4438.3.patch Fixed some warnings > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056483#comment-15056483 ] Wangda Tan commented on YARN-1011: -- Thanks [~kasha], count me in :)! I could help with reviewing/implementation. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056470#comment-15056470 ] Naganarasimha G R commented on YARN-4416: - Hi [~wangda], YARN-4416.v2.002.patch removed synchronized lock on getNumApplications. but i presume there will be possibility that in between {{getNumPendingApplications}} and {{getNumActiveApplications}} that {{activateApplications}} can be called and number of applications count can be given as a wrong value(*more than actual*). Shall i revert for this ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056489#comment-15056489 ] Wangda Tan commented on YARN-4416: -- [~Naganarasimha], I would suggest to revert the change. And delay all OrderingPolicy-related changes to other JIRAs. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Attachment: YARN-4416.v2.003.patch Reverting the removal of lock on {{LeafQueue.getNumApplications}} > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > YARN-4416.v2.001.patch, YARN-4416.v2.002.patch, YARN-4416.v2.003.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Hence we need to ensure following : > # queueCapacity, resource-usage has their own read/write lock hence > synchronization is not req > # numContainers is volatile hence synchronization is not req. > # read/write lock could be added to Ordering Policy. Read operations don't > need synchronized. So {{getNumApplications}} doesn't need synchronized. > (First 2 will be handled in this jira and the third will be handled in > YARN-4443) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4309) Add debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4309: - Summary: Add debug information to container logs when a container fails (was: Add debug information to application logs when a container fails) > Add debug information to container logs when a container fails > -- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add container launch related debug information to container logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056521#comment-15056521 ] Hudson commented on YARN-4309: -- FAILURE: Integrated in Hadoop-trunk-Commit #8962 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8962/]) YARN-4309. Add container launch related debug information to container (wangda: rev dfcbbddb0963c89c0455d41223427165b9f9e537) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Add container launch related debug information to container logs when a > container fails > --- > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch, YARN-4309.008.patch, > YARN-4309.009.patch, YARN-4309.010.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4438: -- Attachment: (was: YARN-4438.3.patch) > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch, YARN-4438.2.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056362#comment-15056362 ] Junping Du commented on YARN-3816: -- Thanks [~sjlee0], [~varun_saxena] and Li's comments. I am rebase the patch with YARN-4356 and incorporating your comments above. Some quick response for your major comments above for more feedback: bq. It appears that the current code will aggregate metrics from all types of entities to the application. This seems problematic to me. The main goal of this aggregation is to roll up metrics from individual containers to the application. But just by having the same metric id, any entity can have its metric aggregated by this (incorrectly). For example, any arbitrary entity can simply declare a metric named "MEMORY". By virtue of that, it would get aggregated and added to the application-level value. There can be variations of this: for example, the same metrics can be reported by the container entity, app attempt entity, and so on. Then the values may be aggregated double or triple. I think we should ensure strongly that the aggregation happens only along the path of YARN container entities to application to prevent these accidental cases. That sounds a reasonable concern here. I agree that we should get rid of metrics get messed up between system metrics and application's metrics. However, I think our goal here is not just aggregate/accumulate container metrics, but also provide aggregation service to applications' metrics (other than MR). Isn't it? If so, may be a better way is to aggregate metrcis along not only metric name but also its original entity type (so memory metrics for ContainerEntity won't be aggregated against memory metrics from Application Entity). [~sjlee0], What do you think? bq. On a semi-related note, what happens if clients send metrics directly at the application entity level? We should expect most framework-specific AMs to do that. For example, MR AM already has all the job-level counters, and it can (and should) report those job-level counters as metrics at the YARN application entity. Is that case handled correctly, or will we end up getting incorrect values (double counting) in that situation? That's why we need the api of toAggregate() in TimelineMetric. For metrics that get aggregated already (like MR AM's counter), it should set it to false to get rid of double counting. Sounds good? bq. calculating area under the curve along the time dimension, would it be useful by itself? Average based on this area under the curve seems more useful. Yes. Both overall and average values are useful in different stand point. Former value can be used to represent how much resources the application actually consume that is very useful in billing cloud service, etc. We can extend later to more values if we think it worth. Varun, make sense? bq. There are 3 types of aggregation basis, but only application aggregation has its own entity type. How do we represent the result entity of the other 2 types? I don't quite understand what's the question here. Li, are u suggesting we should remove application aggregation entity type, add flow/queue aggregation entity type or keep them consistent? > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, > YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056420#comment-15056420 ] Hadoop QA commented on YARN-3226: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 3 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 104, now 105). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 9s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 137m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12777484/0005-YARN-3226.patch | | JIRA
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056430#comment-15056430 ] Naganarasimha G R commented on YARN-4183: - Hi [~sjlee0], [~djp] & [~xgong], Now that YARN-3623 is in, can we decide on this ? Whether we need to introduce another configuration to decide whether client delegation tokens are required to be fetched along with the existing configuration (timeline service and security is enabled ) ? or is it sufficient that clients can configure {{yarn.timeline-service.client.best-effort}} / {{yarn.timeline-service.enabled}} to false > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-4183.1.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056428#comment-15056428 ] Sangjin Lee commented on YARN-4450: --- Can I use a quick review on this? The changes are very much straightforward. > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail > - > > Key: YARN-4450 > URL: https://issues.apache.org/jira/browse/YARN-4450 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4450-feature-YARN-2928.01.patch > > > When I run the unit tests against the current branch, > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail: > {noformat} > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > > TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429 > class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing > in yarn-default.xml > {noformat} > The latter failure is caused by YARN-4356 (when we deprecated > RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was > caused when a later use of field {{resURI}} was added in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements
[ https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056432#comment-15056432 ] Carlo Curino commented on YARN-4360: Rebasing after YARN-4358 got committed. Note: [~imenache] is working on YARN-4359 so some of the ugly "instance of" that you see in this patch are going to go away (as he moves the LowCostAligned agents forward). > Improve GreedyReservationAgent to support "early" allocations, and > performance improvements > > > Key: YARN-4360 > URL: https://issues.apache.org/jira/browse/YARN-4360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4360.2.patch, YARN-4360.3.patch, YARN-4360.patch > > > The GreedyReservationAgent allocates "as late as possible". Per various > conversations, it seems useful to have a mirror behavior that allocates as > early as possible. Also in the process we leverage improvements from > YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which > significantly speeds up allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056450#comment-15056450 ] Li Lu commented on YARN-4450: - Patch LGTM. +1. Will commit shortly. > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail > - > > Key: YARN-4450 > URL: https://issues.apache.org/jira/browse/YARN-4450 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4450-feature-YARN-2928.01.patch > > > When I run the unit tests against the current branch, > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail: > {noformat} > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > > TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429 > class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing > in yarn-default.xml > {noformat} > The latter failure is caused by YARN-4356 (when we deprecated > RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was > caused when a later use of field {{resURI}} was added in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056479#comment-15056479 ] Wangda Tan commented on YARN-4293: -- Thanks [~sunilg], One other comments: - InterfaceAudience for ResourceUtilization seems not correct, if getResourceUtilization is public in NodeReport, ResourceUtilization should be public as well. Is it better to mark all ResourceUtilization related apis to public and unstable? > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4293.patch, 0002-YARN-4293.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056399#comment-15056399 ] Hadoop QA commented on YARN-4450: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 18s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 45s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:7c86163 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12777508/YARN-4450-feature-YARN-2928.01.patch | | JIRA Issue | YARN-4450 | |
[jira] [Commented] (YARN-4194) Extend Reservation Definition Langauge (RDL) extensions to support node labels
[ https://issues.apache.org/jira/browse/YARN-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056449#comment-15056449 ] Carlo Curino commented on YARN-4194: [~atumanov] thanks for contributing this. In general, the patch looks good to me. One nit is that we extend the language without extending {{ReservationInputValidator.validateReservationDefinition(..)}} accordingly. Are you planning to add that in one of the other JIRAs under YARN-4193 umbrella, or should we have it as part of this JIRA? > Extend Reservation Definition Langauge (RDL) extensions to support node labels > -- > > Key: YARN-4194 > URL: https://issues.apache.org/jira/browse/YARN-4194 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Alexey Tumanov > Attachments: YARN-4194-v1.patch, YARN-4194-v2.patch > > > This JIRA tracks changes to the APIs to the reservation system to support > the expressivity of node-labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056452#comment-15056452 ] Naganarasimha G R commented on YARN-4450: - [~sjlee0], tested locally both the cases seems to pass after applying the patch, but as you mentioned the later one is for fix in trunk so do we need to put a patch in trunk ? > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail > - > > Key: YARN-4450 > URL: https://issues.apache.org/jira/browse/YARN-4450 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4450-feature-YARN-2928.01.patch > > > When I run the unit tests against the current branch, > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail: > {noformat} > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > > TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429 > class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing > in yarn-default.xml > {noformat} > The latter failure is caused by YARN-4356 (when we deprecated > RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was > caused when a later use of field {{resURI}} was added in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail
[ https://issues.apache.org/jira/browse/YARN-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056458#comment-15056458 ] Sangjin Lee commented on YARN-4450: --- No, it is not an issue with the trunk. What's done in the trunk is correct. When we rebased our feature branch with the trunk, we fail to modify the trunk change according to the change we're making (not using resURI). So the issue is solely on our branch. Hope that helps. > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail > - > > Key: YARN-4450 > URL: https://issues.apache.org/jira/browse/YARN-4450 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4450-feature-YARN-2928.01.patch > > > When I run the unit tests against the current branch, > TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail: > {noformat} > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » > NullPointer > > TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429 > class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing > in yarn-default.xml > {noformat} > The latter failure is caused by YARN-4356 (when we deprecated > RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was > caused when a later use of field {{resURI}} was added in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)