[jira] [Commented] (YARN-2882) Add an OPPORTUNISTIC ExecutionType
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072089#comment-15072089 ] Wangda Tan commented on YARN-2882: -- [~kasha], No worries, since they're new APIs and in trunk only, I think we don't have to revert or post addendum patch, any user-facing API changes could be added to YARN-4335. Will review YARN-4335 and post comments. > Add an OPPORTUNISTIC ExecutionType > -- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0 > > Attachments: YARN-2882-yarn-2877.001.patch, > YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, > YARN-2882-yarn-2877.004.patch, YARN-2882.005.patch, yarn-2882.patch > > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072104#comment-15072104 ] Inigo Goiri commented on YARN-1013: --- I can take this one once YARN-1015 is done. > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072107#comment-15072107 ] Inigo Goiri commented on YARN-1011: --- The doc looks good. I have a couple questions: # What would be the first policy to implement? I guess we can define it in YARN-1015. # Would it make sense to make over-subscription a global property set by the RM instead of per-node? I think we need a sub-task under this umbrella for the over-subscription property. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page
[ https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072175#comment-15072175 ] Hadoop QA commented on YARN-4315: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:black}{color} | {color:black} compile {color} | {color:black} 0m 30s {color} | {color:black} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 15s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 145m 25s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779581/0002-YARN-4315.patch | | JIRA Issue | YARN-4315 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | |
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072240#comment-15072240 ] Karthik Kambatla commented on YARN-1011: bq. For resource oversubscription enable/disable for individual nodes, I think it's very important since some nodes could be more important than others. Do you think is it fine to add a configuration item to each NM's yarn-site.xml? That is exactly the intent. Let us continue this conversation on YARN-4512. bq. For scheduler-side implementation, instead of modifying individual scheduler, I think we should try to add over-subscription policy to common scheduling layer since it doesn't sounds very related to specific scheduler implementation. Makes sense. Doubt there is any scheduler-specific smarts here. If at all we need to do them separately, it is most likely because our scheduler abstractions are not clean. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page
[ https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072194#comment-15072194 ] Bibin A Chundatt commented on YARN-4315: Test case failures are not related. > NaN in Queue percentage for cluster apps page > - > > Key: YARN-4315 > URL: https://issues.apache.org/jira/browse/YARN-4315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg > > > Steps to reproduce > Submit application > Switch RM and check the percentage of queue usage > Queue percentage shown as NaN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4511) Create common scheduling policy for resource over-subscription
Wangda Tan created YARN-4511: Summary: Create common scheduling policy for resource over-subscription Key: YARN-4511 URL: https://issues.apache.org/jira/browse/YARN-4511 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072227#comment-15072227 ] Wangda Tan commented on YARN-1011: -- Thanks [~kasha] and also comments from [~elgoiri]. Looked at doc, it looks good. Some questions/comments: - For resource oversubscription enable/disable for individual nodes, I think it's very important since some nodes could be more important than others. Do you think is it fine to add a configuration item to each NM's yarn-site.xml? - For scheduler-side implementation, instead of modifying individual scheduler, I think we should try to add over-subscription policy to common scheduling layer since it doesn't sounds very related to specific scheduler implementation. I also agree for the first implementation, we can simply assume nodes have more resource to use. CS shouldn't have issue with this assumption. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4511) Create common scheduling policy for resource over-subscription
[ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4511: - Issue Type: Sub-task (was: Bug) Parent: YARN-1011 > Create common scheduling policy for resource over-subscription > -- > > Key: YARN-4511 > URL: https://issues.apache.org/jira/browse/YARN-4511 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Add an OPPORTUNISTIC ExecutionType
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072210#comment-15072210 ] Konstantinos Karanasos commented on YARN-2882: -- bq. Thanks for picking this up, Inigo Goiri. Hope that is okay with Konstantinos Karanasos. That is fine with me (and I do appreciate the help), given the urgency for unblocking YARN-1011, but let's coordinate better next time. I would have liked to review the patch before we pushed it to trunk. I am travelling at the moment and have limited connectivity, but will give it a look tomorrow. > Add an OPPORTUNISTIC ExecutionType > -- > > Key: YARN-2882 > URL: https://issues.apache.org/jira/browse/YARN-2882 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0 > > Attachments: YARN-2882-yarn-2877.001.patch, > YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, > YARN-2882-yarn-2877.004.patch, YARN-2882.005.patch, yarn-2882.patch > > > This JIRA introduces the notion of container types. > We propose two initial types of containers: guaranteed-start and queueable > containers. > Guaranteed-start are the existing containers, which are allocated by the > central RM and are instantaneously started, once allocated. > Queueable is a new type of container, which allows containers to be queued in > the NM, thus their execution may be arbitrarily delayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1014) Configure OOM Killer to kill OPPORTUNISTIC containers first
[ https://issues.apache.org/jira/browse/YARN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072224#comment-15072224 ] Karthik Kambatla commented on YARN-1014: [~asuresh], [~kkaranasos] - is this something we would want in trunk so we can share with YARN-2877? > Configure OOM Killer to kill OPPORTUNISTIC containers first > --- > > Key: YARN-1014 > URL: https://issues.apache.org/jira/browse/YARN-1014 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Arun C Murthy >Assignee: Karthik Kambatla > > YARN-2882 introduces the notion of OPPORTUNISTIC containers. These containers > should be killed first should the system run out of memory. > - > Previous description: > Once RM allocates 'speculative containers' we need to get LCE to schedule > them at lower priorities via cgroups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072223#comment-15072223 ] Karthik Kambatla commented on YARN-1011: bq. Would it make sense to make over-subscription a global property set by the RM instead of per-node? Good question. I thought about it quite some. Here is my reasoning for doing on the NM side. We can always switch back to defining it to the RM if that makes more sense. # Even if we have the knob on the RM, the node still has to support it: monitor the resource usage on the node and kill the OPPORTUNISTIC containers if need be. On a cluster with NMs of different versions (say, during a rolling upgrade), the RM will have to keep track of NMs that support over-subscription. So, we do need some config for the NM anyway. Further, there could be node-specific conditions - hardware, other services running on the node etc. - that could affect the over-subscription capacity of the node. For instance, it might be okay to sign up for 90% of the advertised capacity on node A, but only 80% on the node B. And, this ability to soak up extra work could change over time. # In terms of implementation, the node already sends its capacity and its aggregate-container-utilization. It might as well send an oversubscription-percentage over, which is interpreted as the fraction of its advertised capacity. e.g. A node with 64 GB memory could advertise its capacity as 50 GB and oversubscription-percentage 0.9. The RM could schedule upto 45 GB of utilization. An oversubscription-percentage <= 0 would indicate the feature is turned off. bq. What would be the first policy to implement? I guess we can define it in YARN-1015. The simplest policy would likely be just assuming there are more resources on the node, and continue allocating with the same policies we use today for free/unallocated resources. This should work okay for the FairScheduler. I am less familiar with the intricate details of CS, but would think it should apply there as well. [~leftnoteasy] - thoughts? > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4512) Provide a knob to turn on over-subscription
[ https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072245#comment-15072245 ] Inigo Goiri commented on YARN-4512: --- Per the discussion in YARN-1011, it makes sense to add an option in yarn-site.xml and each NM to advertise this to the RM. In addition, we should separate oversubscription parameters for each resource (i.e., CPU and memory). > Provide a knob to turn on over-subscription > --- > > Key: YARN-4512 > URL: https://issues.apache.org/jira/browse/YARN-4512 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072228#comment-15072228 ] Wangda Tan commented on YARN-1011: -- I just created YARN-4511 to track common scheduling policy for resource over-subscription. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4512) Provide a knob to turn on over-subscription
Karthik Kambatla created YARN-4512: -- Summary: Provide a knob to turn on over-subscription Key: YARN-4512 URL: https://issues.apache.org/jira/browse/YARN-4512 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4513) [YARN-3368] Upgrade to Ember 2.2.0
Wangda Tan created YARN-4513: Summary: [YARN-3368] Upgrade to Ember 2.2.0 Key: YARN-4513 URL: https://issues.apache.org/jira/browse/YARN-4513 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan It uses Ember 2.0 for now, we should upgrade it to latest Ember. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
Wangda Tan created YARN-4514: Summary: [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses Key: YARN-4514 URL: https://issues.apache.org/jira/browse/YARN-4514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan We have several configurations are hard-coded, for example, RM/ATS addresses, we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
[ https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4518: - Assignee: Sunil G > [YARN-3368] Support rendering statistic-by-node-label for queues/apps page > -- > > Key: YARN-4518 > URL: https://issues.apache.org/jira/browse/YARN-4518 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4517) [YARN-3368] Add nodes page
[ https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-4517: -- Assignee: Varun Saxena > [YARN-3368] Add nodes page > -- > > Key: YARN-4517 > URL: https://issues.apache.org/jira/browse/YARN-4517 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Varun Saxena > > We need nodes page added to next generation web UI, similar to existing > RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers
[ https://issues.apache.org/jira/browse/YARN-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4519: --- Description: In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may be get CapacityScheduler's sync lock in decreaseContainer() In scheduler thread, first get CapacityScheduler's sync lock in allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in FicaSchedulerApp.assignContainers(). was: In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may be get CapacityScheduler's sync lock in decreaseContainer() In scheduler thread, first get CapacityScheduler's sync lock in allocateContainersToNode, and may get FiCaSchedulerApp sync lock in FicaSchedulerApp.assignContainers. > potential deadlock of CapacityScheduler between decrease container and assign > containers > > > Key: YARN-4519 > URL: https://issues.apache.org/jira/browse/YARN-4519 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: sandflee > > In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and > may be get CapacityScheduler's sync lock in decreaseContainer() > In scheduler thread, first get CapacityScheduler's sync lock in > allocateContainersToNode(), and may get FiCaSchedulerApp sync lock in > FicaSchedulerApp.assignContainers(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4519) potential deadlock of CapacityScheduler between decrease container and assign containers
sandflee created YARN-4519: -- Summary: potential deadlock of CapacityScheduler between decrease container and assign containers Key: YARN-4519 URL: https://issues.apache.org/jira/browse/YARN-4519 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: sandflee In CapacityScheduler.allocate() , first get FiCaSchedulerApp sync lock, and may be get CapacityScheduler's sync lock in decreaseContainer() In scheduler thread, first get CapacityScheduler's sync lock in allocateContainersToNode, and may get FiCaSchedulerApp sync lock in FicaSchedulerApp.assignContainers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072438#comment-15072438 ] Jian He commented on YARN-3480: --- lgtm, +1 > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, > YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, > YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, > YARN-3480.12.patch, YARN-3480.13.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4438) Implement RM leader election with curator
[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072436#comment-15072436 ] Jian He commented on YARN-4438: --- bq. Not sure I understand why ZKRMStateStore needs to be an AlwaysOn service. It does not need to be always on, just the zkClient in ZKRMStateStore needs to be always on. bq. How would this change look? At first glance, in AdminService#transitionToStandby and transitionToActive, not call refreshAll if the shared-storage-config-provider is not enabled. bq. Is the concern that Curator may be biased in picking an RM in certain conditions? Yeah, that's just my guess. Immediately rejonning may have more chance to take leadership again. ActiveStandbyElector#reJoinElectionAfterFailureToBecomeActive has similar comments. bq.If leaderLatch.close() throws an exception, when does Curator realize the RM is not participating in the election anymore? Based on my understanding, I think curator will realize when it does not hear RM for the zkSessionTimeout period. Essentially, the zkClient at RM side will keep retrying to notify zk quorum that this client is closed. If close successds, zk quorum will get notified immediately and re-selects a leader. If close is kept retrying for beyond zkSessionTimeout, zk quorum will assume this client dies and re-selects a leader. bq. we might not need that thread. Then, we can remove this thread ? I'll do separately if you agree. bq. What happens if this RM is subsequently elected leader? Does the transition to Active succeed just fine? I think it can transition to active next time it's selected as leader. The previous failure will most likely happen on refreshAcl. bq. I would like for us to err on the side of caution and do null-checks. will do > Implement RM leader election with curator > - > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
[ https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072463#comment-15072463 ] Wangda Tan commented on YARN-4518: -- It's yours :), thanks! > [YARN-3368] Support rendering statistic-by-node-label for queues/apps page > -- > > Key: YARN-4518 > URL: https://issues.apache.org/jira/browse/YARN-4518 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask
[ https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072295#comment-15072295 ] Wangda Tan commented on YARN-4335: -- Thanks [~kkaranasos], The patch generally looks good to me, some nits: - Javadocs of getExecutionType is wrong. - I would prefer to keep new APIs to be unstable so we could update them before features become stable - Javadocs of ExecutionType: if we use ExecutionType for resource request, I would suggest to add description that scheduler could use it to decide if idle resources could be used by the resource request. > Allow ResourceRequests to specify ExecutionType of a request ask > > > Key: YARN-4335 > URL: https://issues.apache.org/jira/browse/YARN-4335 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-4335-yarn-2877.001.patch > > > YARN-2882 introduced container types that are internal (not user-facing) and > are used by the ContainerManager during execution at the NM. > With this JIRA we are introducing (user-facing) resource request types that > are used by the AM to specify the type of the ResourceRequest. > We will initially support two resource request types: CONSERVATIVE and > OPTIMISTIC. > CONSERVATIVE resource requests will be handed internally to containers of > GUARANTEED type, whereas OPTIMISTIC resource requests will be handed to > QUEUEABLE containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072298#comment-15072298 ] Wangda Tan commented on YARN-2885: -- Hi [~asuresh], Thanks for updating. Looked at latest patch, I majorly looked at configuration changes and codes interact with existing RM components. some comments: - Do you have real use case that distributed scheduler needs to set different properties such as DIST_SCHEDULING_MIN_MEMORY? Since MIN_MEMORY is a property that AM needs to know (for purpose of calculating how much resources to request), we need to tell AM when MIN_MEMORY of local RM is different from central RM. I would suggest to use central RM's settings for MIN_MEMORY, etc. if you don't have real use case for now. - First constructor of ApplicationMasterService, should use {{name}} instead of {{ApplicationMasterService.class.getName()}}? - You can add a isDistributedSchedulingEnabled method to YarnConfiguration to avoid duplicated logic like: {code} 314 boolean isDistSchedulingEnabled = 315 conf.getBoolean(YarnConfiguration.DIST_SCHEDULING_ENABLED, 316 YarnConfiguration.DIST_SCHEDULING_ENABLED_DEFAULT); {code} > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch, > YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, > YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, > YARN-2885-yarn-2877.v4.patch, YARN-2885_api_changes.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4517) [YARN-3368] Add nodes page
Wangda Tan created YARN-4517: Summary: [YARN-3368] Add nodes page Key: YARN-4517 URL: https://issues.apache.org/jira/browse/YARN-4517 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan We need nodes page added to next generation web UI, similar to existing RM/nodes page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM
[ https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072373#comment-15072373 ] Sunil G commented on YARN-4515: --- Hi [~leftnoteasy] I will try taking this ticket. Pls let me know if its fine. > [YARN-3368] Support hosting web UI framework inside YARN RM > --- > > Key: YARN-4515 > URL: https://issues.apache.org/jira/browse/YARN-4515 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > > Currently it can be only launched outside of YARN, we should make it runnable > inside YARN for easier testing and we should have a configuration to > enable/disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM
[ https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-4515: - Assignee: Sunil G > [YARN-3368] Support hosting web UI framework inside YARN RM > --- > > Key: YARN-4515 > URL: https://issues.apache.org/jira/browse/YARN-4515 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > > Currently it can be only launched outside of YARN, we should make it runnable > inside YARN for easier testing and we should have a configuration to > enable/disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM
[ https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072461#comment-15072461 ] Wangda Tan commented on YARN-4515: -- Sure! Please go ahead. Thanks, > [YARN-3368] Support hosting web UI framework inside YARN RM > --- > > Key: YARN-4515 > URL: https://issues.apache.org/jira/browse/YARN-4515 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > > Currently it can be only launched outside of YARN, we should make it runnable > inside YARN for easier testing and we should have a configuration to > enable/disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072279#comment-15072279 ] Bikas Saha commented on YARN-1011: -- In my prior experience, something like this is not practical without pro-active cpu management (which has been delegated to future work in the document). It is essential to run opportunistic tasks at lower OS cpu priority so that they never obstruct progress of normal tasks. Typically we will find that the machine is under-allocated the most in cpu usage since most processing has bursty cpu. When a normal task has a cpu burst then it should have to contend with an opportunistic task since this will be detrimental to the expected performance of that task. Without this, jobs will not run predictably in the cluster. From what I have seen, users prefer predictability over most other things. ie. having a 1 min job run in 1 min all the time vs making that job run in 30s 85% of the time and but in 2 mins for 5% of the time because that makes it really hard to establish SLAs. In fact, this is the litmus test for opportunistic scheduling. It should be able to raise the utilization of a cluster from (say 50%) to (say 75%) without affecting the latency of the jobs compared to when the cluster was running at 50%. For memory, in fact, its ok to share and reach 100% capacity but its important to check that the machine does not start thrashing. Most well written tasks will run within their memory limits and start spilling etc. Opportunistic tasks are trying to occupy the memory that these tasks thought they could use but are not using (or that these tasks are keeping in buffer on purpose). The crucial thing to consider here is to look for stats that signify the onset of memory paging activity (or overall memory over-subscription at the OS level). At that point, even normal tasks that are within their limit will be adversely affected because the OS will start paging memory to disk. So we need to start proactively killing opportunistic tasks before the such paging activity gets triggered. Handling opportunistic tasks raises questions on the involvement of the AMs. Unless I missed something, this is not called out clearly in the doc. In that sense it would be instructive to consider opportunistic scheduling in a similar light as preemption. App got container that it should not have gotten at that time if we had been strict but got it because we decided to loosen the strings (of queue capacity or machine capacity resp). - will opportunistic containers be given only when for containers that are beyond queue capacity such that we dont break any guarantees on their liveliness. ie. an AM will not expect to lose any container that is within its queue capacity but opportunistic containers can be killed at any time. - does the AM need to know that a newly allocated container was opportunistic. E.g. so that it does not schedule the highest priority work on that container. - will conversion of opportunistic containers to regular containers be automatically done by the RM? Will the RM notify the AM about such conversions? - when terminating opportunistic containers will the RM ask the AM about which containers to kill? Given the above perf related scenarios this may not be a viable option. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4183: Attachment: YARN-4183.v1.001.patch [~sjlee0], attaching a patch as per previous description, please have a look > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Naganarasimha G R > Attachments: YARN-4183.1.patch, YARN-4183.v1.001.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4514) [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses
[ https://issues.apache.org/jira/browse/YARN-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-4514: --- Assignee: Naganarasimha G R > [YARN-3368] Cleanup hardcoded configurations, such as RM/ATS addresses > -- > > Key: YARN-4514 > URL: https://issues.apache.org/jira/browse/YARN-4514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > We have several configurations are hard-coded, for example, RM/ATS addresses, > we should make them configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4315) NaN in Queue percentage for cluster apps page
[ https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072292#comment-15072292 ] Wangda Tan commented on YARN-4315: -- Looks good, +1, will commit shortly. Thanks [~bibinchundatt]. > NaN in Queue percentage for cluster apps page > - > > Key: YARN-4315 > URL: https://issues.apache.org/jira/browse/YARN-4315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg > > > Steps to reproduce > Submit application > Switch RM and check the percentage of queue usage > Queue percentage shown as NaN -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4516) [YARN-3368] Use em-table to better render tables
Wangda Tan created YARN-4516: Summary: [YARN-3368] Use em-table to better render tables Key: YARN-4516 URL: https://issues.apache.org/jira/browse/YARN-4516 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Currently we're using DataTables, it isn't integrated to Ember.js very well. Instead we can use em-table (which is created for Tez UI). It supports features such as selectable columns, pagination, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM
Wangda Tan created YARN-4515: Summary: [YARN-3368] Support hosting web UI framework inside YARN RM Key: YARN-4515 URL: https://issues.apache.org/jira/browse/YARN-4515 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Currently it can be only launched outside of YARN, we should make it runnable inside YARN for easier testing and we should have a configuration to enable/disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4516) [YARN-3368] Use em-table to better render tables
[ https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4516: - Description: Currently we're using DataTables, it isn't integrated to Ember.js very well. Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, which is created for Tez UI). It supports features such as selectable columns, pagination, etc. was: Currently we're using DataTables, it isn't integrated to Ember.js very well. Instead we can use em-table (which is created for Tez UI). It supports features such as selectable columns, pagination, etc. > [YARN-3368] Use em-table to better render tables > > > Key: YARN-4516 > URL: https://issues.apache.org/jira/browse/YARN-4516 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan > > Currently we're using DataTables, it isn't integrated to Ember.js very well. > Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, > which is created for Tez UI). It supports features such as selectable > columns, pagination, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables
[ https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072307#comment-15072307 ] Wangda Tan commented on YARN-4516: -- Thanks [~Sreenath] for creating em-table. > [YARN-3368] Use em-table to better render tables > > > Key: YARN-4516 > URL: https://issues.apache.org/jira/browse/YARN-4516 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan > > Currently we're using DataTables, it isn't integrated to Ember.js very well. > Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, > which is created for Tez UI). It supports features such as selectable > columns, pagination, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node
sandflee created YARN-4520: -- Summary: FinishAppEvent is leaked in leveldb if no app's container running on this node Key: YARN-4520 URL: https://issues.apache.org/jira/browse/YARN-4520 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: sandflee Assignee: sandflee once we restart nodemanager we see many logs like : 2015-12-28 11:59:18,725 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1446103803043_9892 we find that the app containers are never started on NM but released by AM after allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) [Umbrella] YARN web UI: Next generation
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Summary: [Umbrella] YARN web UI: Next generation (was: [Umbrella] Improve YARN web UI) > [Umbrella] YARN web UI: Next generation > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) > yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072326#comment-15072326 ] Wangda Tan commented on YARN-3215: -- Hi [~Naganarasimha], Thanks for considering this. One idea in my mind is, can we return headroom for all partitions *requested by application* to application. Returning total available resources of the queue to app could lead to over-estimate headroom. It is possible that a queue can use many partitions but app only wants one. Thoughts? > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
[ https://issues.apache.org/jira/browse/YARN-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072374#comment-15072374 ] Sunil G commented on YARN-4518: --- Hi [~leftnoteasy] I will also give a shot for this ticket. Pls let me know if its fine. > [YARN-3368] Support rendering statistic-by-node-label for queues/apps page > -- > > Key: YARN-4518 > URL: https://issues.apache.org/jira/browse/YARN-4518 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072414#comment-15072414 ] Jian He commented on YARN-4138: --- I think it may be true that this will lead to dead lock. - CapacityScheduler#allocateContainersToNode will grab scheduler lock and then SchedulerApp's lock at LeafQueue#assignContainers. - CapacityScheduler#rollbackContainerResource first acquires SchedulerApp's lock and then scheduler lock. -- This will also happen when AM calls CapacityScheduler#allocate to decrease the container. This is introduced in YARN-1651. I had a [comment|https://issues.apache.org/jira/browse/YARN-1651?focusedCommentId=14738568=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14738568] earlier that every AM allocate call will hold scheduler and queue's lock,which is too expensive, but missed that this may lead to deadlock. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072426#comment-15072426 ] MENG DING commented on YARN-4138: - Release containers may have the same issue too. Strange that there has been no reports from the field so far? Looks like we need to implement a pending release/decrease list in the scheduler ... > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072430#comment-15072430 ] sandflee commented on YARN-4138: when release containers , we didn't hold SchedulerApp's lock. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4518) [YARN-3368] Support rendering statistic-by-node-label for queues/apps page
Wangda Tan created YARN-4518: Summary: [YARN-3368] Support rendering statistic-by-node-label for queues/apps page Key: YARN-4518 URL: https://issues.apache.org/jira/browse/YARN-4518 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072314#comment-15072314 ] sandflee commented on YARN-4138: Hi, [~mding], I'll open a new jira to track this, not to delaying this issue. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables
[ https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072334#comment-15072334 ] Li Lu commented on YARN-4516: - Hi [~leftnoteasy], [~Sreenath], if currently there is nobody working on this item, maybe I can work on this to fine tune the tables in ATS v2 branch? Thanks! > [YARN-3368] Use em-table to better render tables > > > Key: YARN-4516 > URL: https://issues.apache.org/jira/browse/YARN-4516 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan > > Currently we're using DataTables, it isn't integrated to Ember.js very well. > Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, > which is created for Tez UI). It supports features such as selectable > columns, pagination, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid
[ https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072387#comment-15072387 ] sandflee commented on YARN-4495: RM will pass InvalidResourceRequestException to AM in below conditions, * deduped containerChangeRequest * invaild ContainerChangeRequest requestConainerSize < 0 or > max * rmContainer == null * rmContainer.state != RUNNING * increaseRequest targeResource < allocatedResource or decreaseRequest targetResource > allocatedResource * nodeResource < increaseRequest targetResource this will cause AMRMClientAsync down, and this will result AM down. it's not user friendly. especially some condition are out of AM's control. * rmContainer == null , maybe RM is recovering, and the corresponding RMContainer has not recovered. * rmContainer.state != RUNNING, maybe container is completed and the complete msg has not pulled by AM yet. * increaseRequest targeResource < allocatedResource or decreaseRequest targetResource > allocatedResource. 1, AM increase resource 1G -> 10G, resource couldn't be satisfied and is pending 2, after a time, AM send a new resourceIncreaseRequest from 1G->5G 3, 10G resource request is satisfied and RMContainer allocatedResource becomes 10G when new resourceIncreaseRequest comes to RM 4, RM checks ResourceIncreaseRequest, and find the target resource is less than RMContainer allocated resource * nodeResource < increaseRequest targetResource, AM knowns nothing of node resource , this should be covered by maximumAllocation. and scheduler may drop container resize request if target resource equals to RMContainer allocatedResource, the problem is AM knows nothing about container resource normalizition. so: if AM requests resource decrease 8G -> 7.5G, and suppose 7.5G is normalized to 8G, rm will drop this request, and will leave AM waiting from the reply. so above all , I sugget to add a msg to AllocateResponse instead of throw InvalidResourceRequestException or drop the change request. hoping for your comments and suggestions! > add a way to tell AM container increase/decrease request is invalid > --- > > Key: YARN-4495 > URL: https://issues.apache.org/jira/browse/YARN-4495 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > > now RM may pass InvalidResourceRequestException to AM or just ignore the > change request, the former will cause AMRMClientAsync down. and the latter > will leave AM waiting for the relay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072303#comment-15072303 ] Naganarasimha G R commented on YARN-4183: - can update the title if the fix is fine > Enabling generic application history forces every job to get a timeline > service delegation token > > > Key: YARN-4183 > URL: https://issues.apache.org/jira/browse/YARN-4183 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Naganarasimha G R > Attachments: YARN-4183.1.patch, YARN-4183.v1.001.patch > > > When enabling just the Generic History Server and not the timeline server, > the system metrics publisher will not publish the events to the timeline > store as it checks if the timeline server and system metrics publisher are > enabled before creating a timeline client. > To make it work, if the timeline service flag is turned on, it will force > every yarn application to get a delegation token. > Instead of checking if timeline service is enabled, we should be checking if > application history server is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072302#comment-15072302 ] Wangda Tan commented on YARN-1011: -- Thanks, bq. Makes sense. Doubt there is any scheduler-specific smarts here. If at all we need to do them separately, it is most likely because our scheduler abstractions are not clean. Agree! > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun C Murthy > Attachments: yarn-1011-design-v0.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3368) [Umbrella] YARN web UI: Next generation
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072309#comment-15072309 ] Wangda Tan commented on YARN-3368: -- I just created several sub tasks. Please feel free to assign to yourself if you interested. Thanks! > [Umbrella] YARN web UI: Next generation > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) > yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token
[ https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072427#comment-15072427 ] Hadoop QA commented on YARN-4183: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 6s {color} | {color:green} hadoop-yarn-site in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 8s {color} | {color:green} hadoop-yarn-site in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 57s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779596/YARN-4183.v1.001.patch | | JIRA Issue | YARN-4183 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml | | uname | Linux 56b10cd8b7e5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fb00794 | | Default Java |
[jira] [Updated] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node
[ https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4520: --- Attachment: YARN-4520.01.patch > FinishAppEvent is leaked in leveldb if no app's container running on this node > -- > > Key: YARN-4520 > URL: https://issues.apache.org/jira/browse/YARN-4520 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4520.01.patch > > > once we restart nodemanager we see many logs like : > 2015-12-28 11:59:18,725 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: FINISH_APPLICATION sent to absent application > application_1446103803043_9892 > we find that the app containers are never started on NM but released by AM > after allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072478#comment-15072478 ] Rohith Sharma K S commented on YARN-4393: - I think Varun's analysis make sense to me that need not add everywhere dispatcher.await. We can use dispatcher.await only when asserting for event type OR asserting any functionality after processing that event from dispatcher may be asserting for the values. > TestResourceLocalizationService#testFailedDirsResourceRelease fails > intermittently > -- > > Key: YARN-4393 > URL: https://issues.apache.org/jira/browse/YARN-4393 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4393.01.patch > > > [~ozawa] pointed out this failure on YARN-4380. > Check > https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773 > {noformat} > sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.093 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > eventHandler.handle( > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > Actual invocation has different arguments: > eventHandler.handle( > EventType: APPLICATION_INITED > ); > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4482) Default values of several config parameters are missing
[ https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Shahid Khan updated YARN-4482: --- Assignee: (was: Mohammad Shahid Khan) > Default values of several config parameters are missing > > > Key: YARN-4482 > URL: https://issues.apache.org/jira/browse/YARN-4482 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Priority: Minor > > In {{yarn-default.xml}}, the default values of the following parameters are > commented out, > {{yarn.client.failover-max-attempts}} > {{yarn.client.failover-sleep-base-ms}} > {{yarn.client.failover-sleep-max-ms}} > Are these default values changed (I suppose so)? If so, we should update the > new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" > values... > (yarn-default.xml) > https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4516) [YARN-3368] Use em-table to better render tables
[ https://issues.apache.org/jira/browse/YARN-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072526#comment-15072526 ] Sreenath Somarajapuram commented on YARN-4516: -- [~gtCarrera9] Feel free to take-up the task after checking with [~leftnoteasy]. Please let me know if you need any help with em-table. > [YARN-3368] Use em-table to better render tables > > > Key: YARN-4516 > URL: https://issues.apache.org/jira/browse/YARN-4516 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan > > Currently we're using DataTables, it isn't integrated to Ember.js very well. > Instead we can use em-table (see https://github.com/sreenaths/em-table/wiki, > which is created for Tez UI). It supports features such as selectable > columns, pagination, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4482) Default values of several config parameters are missing
[ https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072484#comment-15072484 ] Mohammad Shahid Khan commented on YARN-4482: Hi [#Tianyin Xu] agree with you. we can mark this jira won't fix. > Default values of several config parameters are missing > > > Key: YARN-4482 > URL: https://issues.apache.org/jira/browse/YARN-4482 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.6.2, 2.6.3 >Reporter: Tianyin Xu >Priority: Minor > > In {{yarn-default.xml}}, the default values of the following parameters are > commented out, > {{yarn.client.failover-max-attempts}} > {{yarn.client.failover-sleep-base-ms}} > {{yarn.client.failover-sleep-max-ms}} > Are these default values changed (I suppose so)? If so, we should update the > new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" > values... > (yarn-default.xml) > https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072480#comment-15072480 ] MENG DING commented on YARN-4138: - You are right, I remembered that wrong. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node
[ https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072489#comment-15072489 ] Hadoop QA commented on YARN-4520: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager (total was 53, now 54). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 25s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 32s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12779609/YARN-4520.01.patch | | JIRA Issue | YARN-4520 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d4a033bcd0a9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3
[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
[ https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072481#comment-15072481 ] Sunil G commented on YARN-4352: --- ASF warnings are not related to this patch. This change covers test case fix, so test coverage seems fine. [~rohithsharma] is it ok? > Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient > > > Key: YARN-4352 > URL: https://issues.apache.org/jira/browse/YARN-4352 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Labels: security > Attachments: 0001-YARN-4352.patch, 0002-YARN-4352.patch > > > From > https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt, > we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get > timeout which can be reproduced locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4520) FinishAppEvent is leaked in leveldb if no app's container running on this node
[ https://issues.apache.org/jira/browse/YARN-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-4520: --- Attachment: YARN-4520.02.patch fix checkstyle errors > FinishAppEvent is leaked in leveldb if no app's container running on this node > -- > > Key: YARN-4520 > URL: https://issues.apache.org/jira/browse/YARN-4520 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee > Attachments: YARN-4520.01.patch, YARN-4520.02.patch > > > once we restart nodemanager we see many logs like : > 2015-12-28 11:59:18,725 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Event EventType: FINISH_APPLICATION sent to absent application > application_1446103803043_9892 > we find that the app containers are never started on NM but released by AM > after allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages
[ https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072148#comment-15072148 ] Inigo Goiri commented on YARN-4330: --- [~ste...@apache.org], is this good to go? > MiniYARNCluster prints multiple Failed to instantiate default resource > calculator warning messages > --- > > Key: YARN-4330 > URL: https://issues.apache.org/jira/browse/YARN-4330 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 2.8.0 > Environment: OSX, JUnit >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Blocker > Attachments: YARN-4330.01.patch > > > Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I > see multiple stack traces warning me that a resource calculator plugin could > not be created > {code} > (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - > java.lang.UnsupportedOperationException: Could not determine OS: Failed to > instantiate default resource calculator. > java.lang.UnsupportedOperationException: Could not determine OS > {code} > This is a minicluster. It doesn't need resource calculation. It certainly > doesn't need test logs being cluttered with even more stack traces which will > only generate false alarms about tests failing. > There needs to be a way to turn this off, and the minicluster should have it > that way by default. > Being ruthless and marking as a blocker, because its a fairly major > regression for anyone testing with the minicluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4315) NaN in Queue percentage for cluster apps page
[ https://issues.apache.org/jira/browse/YARN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4315: --- Attachment: 0002-YARN-4315.patch [~leftnoteasy] Thank you for looking into patch . Updated patch based on comments please do review. > NaN in Queue percentage for cluster apps page > - > > Key: YARN-4315 > URL: https://issues.apache.org/jira/browse/YARN-4315 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4315.patch, 0002-YARN-4315.patch, Snap1.jpg > > > Steps to reproduce > Submit application > Switch RM and check the percentage of queue usage > Queue percentage shown as NaN -- This message was sent by Atlassian JIRA (v6.3.4#6332)