[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8795: -- Attachment: YARN-8795.003.patch > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch, YARN-8795.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620131#comment-16620131 ] Hadoop QA commented on YARN-8795: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 21s{color} | {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940350/YARN-8795.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21875/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8795: -- Attachment: YARN-8795.002.patch > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8795: -- Attachment: (was: YARN-8795.001.patch) > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assgining result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8793: -- Attachment: (was: YARN-8793.001.patch) > QueuePlacementPolicy bind more information to assgining result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assgining result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8793: -- Attachment: YARN-8793.001.patch > QueuePlacementPolicy bind more information to assgining result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620102#comment-16620102 ] Hadoop QA commented on YARN-8795: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940347/YARN-8795.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21873/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8795.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620077#comment-16620077 ] Eric Yang commented on YARN-8777: - [~Zian Chen] {quote} The method param list have out an outlen which didn't match the signature, and we miss description for param args, is this typo? {quote} Good catch, I will make correction. {quote}we can probably give an enum to index several common used command options, and ask node manager only pass index which can be matched with one of these enum elements, in this way we can have some kind of flexibility without open up bigger attack interface. {quote} The enum approach can be used for fixed number of parameters or a small set of parameters. It is probably not an ideal interface to pass arbitrary commands to container-executor for docker exec. One possible danger is sending hex code as argv to trigger buffer overflow in container-executor or docker, where there is no logic to validate the arbitrary command. {quote}should we also take care of passing shell commands inside the container ?{quote} The entire pipeline looks like websocket > node manger > container-executor > docker -it exec bash. Every keystroke is write from web socket to bash to interpret the incoming input stream via stdin. All output are written out from bash to stdout back to web socket. This simulates the terminal behavior. There is no need to do additional processing of shell commands with current arrangement. > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assgining result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620071#comment-16620071 ] Hadoop QA commented on YARN-8793: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-8793 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8793 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940342/YARN-8793.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21872/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > QueuePlacementPolicy bind more information to assgining result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8795) QueuePlacementRule move to separate files
Shuai Zhang created YARN-8795: - Summary: QueuePlacementRule move to separate files Key: YARN-8795 URL: https://issues.apache.org/jira/browse/YARN-8795 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Affects Versions: 3.1.1 Reporter: Shuai Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620042#comment-16620042 ] Tao Yang commented on YARN-8771: Attached v3 patch to improve unit test without adding new "resource-types-1.xml" file. Found the {{yarn.test.reset-resource-types}} configuration item from TestCapacitySchedulerWithMultiResourceTypes, it can avoid reloading resource types in MockRM. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8771: --- Attachment: YARN-8771.003.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch, > YARN-8771.003.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620022#comment-16620022 ] Hadoop QA commented on YARN-8752: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 36m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8752 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940336/YARN-8752-1.patch | | Optional Tests | dupname asflicense mvnsite | | uname | Linux 7c52c69daeb9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 17f5651 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 332 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21870/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1662#comment-1662 ] Weiwei Yang commented on YARN-8468: --- Thanks [~bsteinbach] for the updates. A general comment to your patch is, please minimize the changes in the patch to keep it neat. Format changes, switch lines etc should be avoid. Otherwise it is not easy to review because we can't focus on the changes required for this feature. Detail review comments below: 1. DefaultAMSProcessor All the changes in DefaultAMSProcessor seems unnecessary 2. FileScheduler.md {noformat} maximum resources a queue can allocate for a single container, expressed in the form of {noformat} a extra space between "expressed" and "in" 3. I think we should use consistent names, but I saw some places have "Max Container Allocation/maxContainerAllocation" and some other places have "Max Container Resources/maxContainerResources". A bit confusing. 4. The changes of those imports to {{FSLeafQueue}} seems not necessary, please do not include changes like to put a import to another line etc. Same comment applies to classes like {{FSParentQueue}}, {{PlacementConstraintProcessor}}, {{QueueProperties}}, {{TestAllocationFileLoaderService}}, {{TestAppManager}}, {{TestCapacityScheduler}} 5. RMServiceUtils.java, line 331 - 337, changes are not necessary 6. SchedulerUtils.java, except the changes in {{normalizeAndvalidateRequest}} and {{validateResourceRequest}}, please remove rest of changes 7. Lots of unnecessary changes in {{TestRMServerUtils}} 8. Why it even changes {{TestSchedulerNegotiator}} and {{TestAMLaunchFailure}}, both of them are no longer used right? Besides, there are still 18 checkstyle issues to fix. Thanks > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620001#comment-16620001 ] Tao Yang commented on YARN-8771: Thanks [~leftnoteasy] for your review and suggestion. I will update the patch later to improve unit test. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8794) QueuePlacementPolicy add more rules
[ https://issues.apache.org/jira/browse/YARN-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang updated YARN-8794: -- Description: Still need more useful rules: # RejectNonLeafQueue # RejectDefaultQueue # RejectUsers # RejectQueues # DefaultByUser > QueuePlacementPolicy add more rules > --- > > Key: YARN-8794 > URL: https://issues.apache.org/jira/browse/YARN-8794 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Shuai Zhang >Priority: Major > > Still need more useful rules: > # RejectNonLeafQueue > # RejectDefaultQueue > # RejectUsers > # RejectQueues > # DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8794) QueuePlacementPolicy add more rules
Shuai Zhang created YARN-8794: - Summary: QueuePlacementPolicy add more rules Key: YARN-8794 URL: https://issues.apache.org/jira/browse/YARN-8794 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Shuai Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8793) QueuePlacementPolicy bind more information to assgining result
Shuai Zhang created YARN-8793: - Summary: QueuePlacementPolicy bind more information to assgining result Key: YARN-8793 URL: https://issues.apache.org/jira/browse/YARN-8793 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Affects Versions: 3.1.1 Reporter: Shuai Zhang Fair scheduler's QueuePlacementPolicy should bind more information to assigning result: # Whether to terminate the chain of responsibility # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8792) Revisit FiarScheduler QueuePlacementPolicy
Shuai Zhang created YARN-8792: - Summary: Revisit FiarScheduler QueuePlacementPolicy Key: YARN-8792 URL: https://issues.apache.org/jira/browse/YARN-8792 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 3.1.1 Reporter: Shuai Zhang Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There are several problems: # The termination of the responsibility chain should bind to the assigning result instead of the rule. # It should provide a reason when rejecting a request. # Still need more useful rules: ## RejectNonLeafQueue ## RejectDefaultQueue ## RejectUsers ## RejectQueues ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang reopened YARN-8752: > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived
[ https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leiqiang updated YARN-8752: --- Attachment: (was: YARN-8752-1.patch) > yarn-registry.md has wrong word ong-lived,it should be long-lived > - > > Key: YARN-8752 > URL: https://issues.apache.org/jira/browse/YARN-8752 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.0 >Reporter: leiqiang >Priority: Major > Labels: documentation > Attachments: YARN-8752-1.patch > > > In yarn-registry.md line 88, > deploy {color:#FF}ong-lived{color} services instances, this word should > be {color:#FF}long-lived{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619930#comment-16619930 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 5s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 56s{color} | {color:orange} root: The patch generated 8 new + 814 unchanged - 11 fixed = 822 total (was 825) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 33s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 20s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 50s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}182m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8789 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940311/YARN-8789.2.patch | | Optional Tests | dupname asflicense compile javac
[jira] [Updated] (YARN-8791) When STOPSIGNAL is not present then docker inspect returns an extra line feed
[ https://issues.apache.org/jira/browse/YARN-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8791: Attachment: YARN-8791.001.patch > When STOPSIGNAL is not present then docker inspect returns an extra line feed > - > > Key: YARN-8791 > URL: https://issues.apache.org/jira/browse/YARN-8791 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8791.001.patch > > > When the STOPSIGNAL is missing, then an extra line feed is appended to the > output. This messes with the signal sent to the docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8791) When STOPSIGNAL is not present then docker inspect returns an extra line feed
Chandni Singh created YARN-8791: --- Summary: When STOPSIGNAL is not present then docker inspect returns an extra line feed Key: YARN-8791 URL: https://issues.apache.org/jira/browse/YARN-8791 Project: Hadoop YARN Issue Type: Bug Reporter: Chandni Singh Assignee: Chandni Singh When the STOPSIGNAL is missing, then an extra line feed is appended to the output. This messes with the signal sent to the docker container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619845#comment-16619845 ] Hadoop QA commented on YARN-8696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 4s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}201m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8696 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940281/YARN-8696.v5.patch | | Optional Tests | dupname asflicense
[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8771: - Target Version/s: 3.1.1, 3.2.0 > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619835#comment-16619835 ] Wangda Tan commented on YARN-8771: -- Nice catch! Thanks [~Tao Yang]. Patch LGTM as well. For the test, you can check TestCapacitySchedulerWithMultiResourceTypes as examples about how to do unit tests for multiple resource types without adding resource-types.xml. And I think we should put this to branch-3.1 as well. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk
[ https://issues.apache.org/jira/browse/YARN-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619817#comment-16619817 ] Hadoop QA commented on YARN-8784: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 141 unchanged - 0 fixed = 144 total (was 141) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 23s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8784 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940298/YARN-8784.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 451cfea3df3f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e71f61e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21868/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21868/testReport/ | | Max. process+thread count | 332 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619809#comment-16619809 ] Arun Suresh commented on YARN-1013: --- Thanks for taking a quick look [~elgoiri] So, the patch was more of a POC patch (I should have named it as such) I built on top of current branch-2 + some YARN-1011 patches I pulled from that branch - to vet the approach, but yes, I shall clean it up.. and put in a patch for trunk. > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-1013-001.branch-2.patch > > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619796#comment-16619796 ] Hadoop QA commented on YARN-7599: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} YARN-7402 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 24s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 28s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 44s{color} | {color:green} YARN-7402 passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} YARN-7402 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s{color} | {color:green} YARN-7402 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 25s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7599 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940285/YARN-7599-YARN-7402.v5.patch | | Optional Tests | dupname asflicense compile javac
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619771#comment-16619771 ] Íñigo Goiri commented on YARN-1013: --- Thanks [~asuresh] for [^YARN-1013-001.branch-2.patch]. A couple general questions: * Can we get a patch for trunk for Yetus to be able to run (branch-2 has issues)? * Can you give an overview comparing to the FS approach? I went through the patch and it is hard to compare as this uses the allocator. Comments to the patch itself: * Some of the debug messages seem for development. Should we keep all of them? * Can you add more comments to {{testContainerOverAllocation()}}? For example, we setup one node without overallocation and one with it. Why those numbers and what is the goal? * Can we add a couple lower level unit tests? Just testing the allocator or the scheduler? * There are many space fixes, can we avoid most of them? Specially, pass the null by default as second parameter to registerNode for TestAMRestart and TestReservations. > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-1013-001.branch-2.patch > > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk
[ https://issues.apache.org/jira/browse/YARN-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-8784: -- Attachment: YARN-8784.001.patch > DockerLinuxContainerRuntime prevents access to distributed cache entries on a > full disk > --- > > Key: YARN-8784 > URL: https://issues.apache.org/jira/browse/YARN-8784 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0, 3.1.1 >Reporter: Jason Lowe >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-8784.001.patch > > > DockerLinuxContainerRuntime bind mounts the filecache and usercache > directories into the container to allow tasks to access entries in the > distributed cache. However it only bind mounts directories on disks that > are considered good, and disks that are full or bad are not in that list. If > a container tries to run with a distributed cache entry that has been > previously localized to a disk that is now considered full/bad, the dist > cache directory will _not_ be bind-mounted into the container's filesystem > namespace. At that point any symlinks in the container's current working > directory that point to those disks will reference invalid paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619730#comment-16619730 ] Hudson commented on YARN-8648: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14997 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14997/]) YARN-8648. Container cgroups are leaked when using docker. Contributed (jlowe: rev 2df0a8dcb3dfde15d216481cc1296d97d2cb5d43) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerCommandExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRmCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerRmCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8648.001.patch, YARN-8648.002.patch, > YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, > YARN-8648.006.patch > > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619716#comment-16619716 ] Hadoop QA commented on YARN-8789: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 30s{color} | {color:orange} root: The patch generated 9 new + 788 unchanged - 10 fixed = 797 total (was 798) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 42s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 55s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 16s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}198m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8789 | | JIRA Patch URL |
[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619709#comment-16619709 ] Arun Suresh commented on YARN-1013: --- Attached an initial version of the patch for branch-2. Kindly review.. > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-1013-001.branch-2.patch > > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-1013: -- Attachment: YARN-1013-001.branch-2.patch > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun Suresh >Priority: Major > Attachments: YARN-1013-001.branch-2.patch > > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619670#comment-16619670 ] Jason Lowe commented on YARN-8648: -- Thanks, [~billie.rinaldi]! Looks like this is good to go then. Committing this. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Attachments: YARN-8648.001.patch, YARN-8648.002.patch, > YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, > YARN-8648.006.patch > > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8790) Authentication Filter change to force security check
Zian Chen created YARN-8790: --- Summary: Authentication Filter change to force security check Key: YARN-8790 URL: https://issues.apache.org/jira/browse/YARN-8790 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zian Chen Hadoop node manager REST API is authenticated using AuthenticationFilter from Hadoop-auth project. AuthenticationFilter is added to the new WebSocket URL path spec. The requested remote user is verified to match the container owner to allow WebSocket connection to be established. WebSocket servlet code enforces the username match check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker
[ https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619655#comment-16619655 ] Billie Rinaldi commented on YARN-8648: -- Thanks for checking in, [~Jim_Brennan] and [~jlowe]. I don't have any issues with the new behavior of the remove operation failing when cgroups fail to be cleaned up. > Container cgroups are leaked when using docker > -- > > Key: YARN-8648 > URL: https://issues.apache.org/jira/browse/YARN-8648 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: Docker > Attachments: YARN-8648.001.patch, YARN-8648.002.patch, > YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, > YARN-8648.006.patch > > > When you run with docker and enable cgroups for cpu, docker creates cgroups > for all resources on the system, not just for cpu. For instance, if the > {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, > the nodemanager will create a cgroup for each container under > {{/sys/fs/cgroup/cpu/hadoop-yarn}}. In the docker case, we pass this path > via the {{--cgroup-parent}} command line argument. Docker then creates a > cgroup for the docker container under that, for instance: > {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}. > When the container exits, docker cleans up the {{docker_container_id}} > cgroup, and the nodemanager cleans up the {{container_id}} cgroup, All is > good under {{/sys/fs/cgroup/hadoop-yarn}}. > The problem is that docker also creates that same hierarchy under every > resource under {{/sys/fs/cgroup}}. On the rhel7 system I am using, these > are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, > perf_event, and systemd.So for instance, docker creates > {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but > it only cleans up the leaf cgroup {{docker_container_id}}. Nobody cleans up > the {{container_id}} cgroups for these other resources. On one of our busy > clusters, we found > 100,000 of these leaked cgroups. > I found this in our 2.8-based version of hadoop, but I have been able to > repro with current hadoop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619631#comment-16619631 ] Botong Huang commented on YARN-7599: Thanks [~bibinchundatt] for the review! [^YARN-7599-YARN-7402.v5.patch] uploaded with fixes. bq. 4. Rename testcase name -> testBasicCase This is already the test case name, typo? bq. 1. During long maintainance period of cluster, We might require option to disable cleaner at run time. rt ?? Application cleaner is disabled when YarnConfiguration.GPG_APPCLEANER_INTERVAL_MS is set to zero or negative value: {code:java} 120 long appCleanerIntervalMs = 121 getConfig().getLong(YarnConfiguration.GPG_APPCLEANER_INTERVAL_MS, 122 YarnConfiguration.DEFAULT_GPG_APPCLEANER_INTERVAL_MS); 123 if (appCleanerIntervalMs > 0) { 124 this.scheduledExecutorService.scheduleAtFixedRate(this.applicationCleaner, 125 0, appCleanerIntervalMs, TimeUnit.MILLISECONDS); {code} > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-7599: --- Attachment: YARN-7599-YARN-7402.v5.patch > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8696: --- Attachment: YARN-8696.v5.patch > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, > YARN-8696.v3.patch, YARN-8696.v4.patch, YARN-8696.v5.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619572#comment-16619572 ] Bibin A Chundatt commented on YARN-7599: Thank you [~botong] for clarification Few comments on patch # Can you change to single configuration similar to {{dfs.http.client.retry.policy.spec}} {min,max,interval}. Advantage - reduces configs {code} 3381 public static final String GPG_APPCLEANER_MIN_ROUTER_SUCCESS = 3382 FEDERATION_GPG_PREFIX + "application.cleaner.min-router-success"; 3389 public static final String GPG_APPCLEANER_MAX_ROUTER_RETRY = 3390 FEDERATION_GPG_PREFIX + "application.cleaner.max-router-retry"; 3391 public static final int DEFAULT_GPG_APPCLEANER_MAX_ROUTER_RETRY = 10; 3397 public static final String GPG_APPCLEANER_ROUTER_RETRY_INTEVAL_MS = 3398 FEDERATION_GPG_PREFIX + "application.cleaner.router-retry-interval-ms"; {code} # Class should be DefaultApplicationCleaner {code} 38 public class DefaultApplicationCleaner extends ApplicationCleaner { 39private static final Logger LOG = 40LoggerFactory.getLogger(ApplicationCleaner.class); {code} # Log flooding will happen in debug mode with current implementation. Comma separated list is better routerApps.stream().map(Object::toString).collect(Collectors.joining(",")); {code} 59for (ApplicationId appId : routerApps) { 60 LOG.debug("Running application {} in the cluster", appId.toString()); 61} {code} # Rename testcase name -> {{testBasicCase}} Clarfication # During long maintainance period of cluster, We might require option to disable cleaner at run time. rt ?? > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command
[ https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619569#comment-16619569 ] Zian Chen commented on YARN-8777: - Hi [~eyang], thanks for the patch, some quick suggestions and questions, 1. {code:java} /** + * Get the Docker exec command line string. The function will verify that the params file is meant for the exec command. + * @param command_file File containing the params for the Docker start command + * @param conf Configuration struct containing the container-executor.cfg details + * @param out Buffer to fill with the exec command + * @param outlen Size of the output buffer + * @return Return code with 0 indicating success and non-zero codes indicating error + */ +int get_docker_exec_command(const char* command_file, const struct configuration* conf, args *args);{code} The method param list have out an outlen which didn't match the signature, and we miss description for param args, is this typo? 2. for the code reuse you discussed with [~ebadger], my quick thoughts is instead of passing parameters from node manager, we can probably give an enum to index several common used command options, and ask node manager only pass index which can be matched with one of these enum elements, in this way we can have some kind of flexibility without open up bigger attack interface. 3. This patch seems focus on running docker exec -it command to attach to a running container, but later on when the pipeline is been build, should we also take care of passing shell commands inside the container ? > Container Executor C binary change to execute interactive docker command > > > Key: YARN-8777 > URL: https://issues.apache.org/jira/browse/YARN-8777 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zian Chen >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8777.001.patch > > > Since Container Executor provides Container execution using the native > container-executor binary, we also need to make changes to accept new > “dockerExec” method to invoke the corresponding native function to execute > docker exec command to the running container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619558#comment-16619558 ] Hadoop QA commented on YARN-8757: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 33s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine in trunk has 4 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: The patch generated 8 new + 52 unchanged - 3 fixed = 60 total (was 55) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine generated 0 new + 2 unchanged - 2 fixed = 2 total (was 4) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8757 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940269/YARN-8757.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6e3853f8c8ee 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f938925 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/21865/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html | | checkstyle |
[jira] [Commented] (YARN-8767) TestStreamingStatus fails
[ https://issues.apache.org/jira/browse/YARN-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619536#comment-16619536 ] Hadoop QA commented on YARN-8767: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} hadoop-tools_hadoop-streaming generated 0 new + 78 unchanged - 5 fixed = 78 total (was 83) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-tools/hadoop-streaming: The patch generated 1 new + 49 unchanged - 16 fixed = 50 total (was 65) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 47s{color} | {color:green} hadoop-streaming in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8767 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940255/YARN-8767.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d58f9c7c208f 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f938925 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21863/artifact/out/diff-checkstyle-hadoop-tools_hadoop-streaming.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21863/testReport/ | | Max. process+thread count | 750 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-streaming U: hadoop-tools/hadoop-streaming | | Console output |
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619511#comment-16619511 ] Giovanni Matteo Fumarola commented on YARN-8696: Thanks [~botong] . [^YARN-8696.v4.patch]looks good. Please rebase it. > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, > YARN-8696.v3.patch, YARN-8696.v4.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reopened YARN-8786: -- This should be left open to track the sporadic failure in creating directories. YARN-8751 may make this sporadic problem not take out the NM, but it still causes the requested container launch to fail. That should be fixed, and this JIRA can track that effort. > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the EXITED_WITH_FAILURE step (exit code 35). > We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, > the only major JIRAs that affected the executor since 3.0.0 seem unrelated > ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8] > and > [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-8725) Submarine job staging directory has a lot of useless PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple times
[ https://issues.apache.org/jira/browse/YARN-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619480#comment-16619480 ] Wangda Tan commented on YARN-8725: -- [~tangzhankun], cleanup whole staging dir seems overkill because models, etc. by default is placed under the directory as well. And logics in your patch cleans up dirs after job submitted. It is possible that workers get launched after dir got deleted. I'm not sure if we can do many meaningful things here in the client code. It might be better to do this in the server side, I don't have a clear idea about how to handle the service part. Maybe it should be a plugin of ApiServer, or it is a completely new service like a system service. > Submarine job staging directory has a lot of useless > PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple > times > -- > > Key: YARN-8725 > URL: https://issues.apache.org/jira/browse/YARN-8725 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zac Zhou >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8725-trunk.001.patch > > > Submarine jobs upload core-site.xml, hdfs-site.xml, job.info and > PRIMARY_WORKER-launch-script.sh to staging dir. > The core-site.xml, hdfs-site.xml and job.info would be overwritten if a job > is submitted multiple times. > But PRIMARY_WORKER-launch-script.sh would not be overwritten, as it has > random numbers in its name. > The files in the staging dir are as follows: > {code:java} > -rw-r- 2 hadoop hdfs 580 2018-08-17 10:11 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script6954941665090337726.sh > -rw-r- 2 hadoop hdfs 580 2018-08-17 10:02 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script7037369696166769734.sh > -rw-r- 2 hadoop hdfs 580 2018-08-17 10:06 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8047707294763488040.sh > -rw-r- 2 hadoop hdfs 15225 2018-08-17 18:46 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8122565781159446375.sh > -rw-r- 2 hadoop hdfs 580 2018-08-16 20:48 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8598604480700049845.sh > -rw-r- 2 hadoop hdfs 580 2018-08-17 14:53 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script971703616848859353.sh > -rw-r- 2 hadoop hdfs 580 2018-08-17 10:16 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script990214235580089093.sh > -rw-r- 2 hadoop hdfs 8815 2018-08-27 15:54 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/core-site.xml > -rw-r- 2 hadoop hdfs 11583 2018-08-27 15:54 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/hdfs-site.xml > -rw-rw-rw- 2 hadoop hdfs 846 2018-08-22 10:56 > hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/job.info > {code} > > We should stop the staging dir from growing or have a way to clean it up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619469#comment-16619469 ] Wangda Tan commented on YARN-8757: -- Thanks [~sunilg], For 1. The default case is already handled by service spec's artifact. For 2. It is intentional, no issues here. For 3. I intentionally added this since only http(s) links is allowed. The quicklink is intentionally used to redirect in browser so I think it is fine. For 4. Updated . Attaching ver.3 patch. > [Submarine] Add Tensorboard component when --tensorboard is specified > - > > Key: YARN-8757 > URL: https://issues.apache.org/jira/browse/YARN-8757 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8757.001.patch, YARN-8757.002.patch, > YARN-8757.003.patch > > > We need to have a Tensorboard component when --tensorboard is specified. And > we need to set quicklinks to let users view tensorboard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified
[ https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8757: - Attachment: YARN-8757.003.patch > [Submarine] Add Tensorboard component when --tensorboard is specified > - > > Key: YARN-8757 > URL: https://issues.apache.org/jira/browse/YARN-8757 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8757.001.patch, YARN-8757.002.patch, > YARN-8757.003.patch > > > We need to have a Tensorboard component when --tensorboard is specified. And > we need to set quicklinks to let users view tensorboard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619451#comment-16619451 ] Jon Bender commented on YARN-8786: -- Ah, I figured we wanted to leave this open until the race conditions on the executor were resolved, but if we feel it's infrequent enough and the user-facing impact is minimal on 3.1.2+ I'm OK duping this into YARN-8751 > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the EXITED_WITH_FAILURE step (exit code 35). > We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, > the only major JIRAs that affected the executor since 3.0.0 seem unrelated > ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8] > and > [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619449#comment-16619449 ] Botong Huang commented on YARN-7599: bq. One concern is the response size Yeah we've hit this exact same problem in our federated cluster. That's why I added "RMWSConsts.DESELECTS" (introduced in YARN-6280) when calling getApps to Router in _GPGUtils_. The _ResourceRequest_ entries occupy a significant portion of the appInfo response. In our prod cluster, this deselect reduces the getApps call latency from 20 mins to seconds. > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.1.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8789: - Assignee: BELUGA BEHR > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger resolved YARN-8786. --- Resolution: Duplicate Sounds good. Resolving this as a duplicate of YARN-8751 > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the EXITED_WITH_FAILURE step (exit code 35). > We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, > the only major JIRAs that affected the executor since 3.0.0 seem unrelated > ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8] > and > [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56]) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate
[ https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh reassigned YARN-1013: - Assignee: Arun Suresh (was: Weiwei Yang) > CS should watch resource utilization of containers and allocate speculative > containers if appropriate > - > > Key: YARN-1013 > URL: https://issues.apache.org/jira/browse/YARN-1013 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Arun Suresh >Priority: Major > > CS should watch resource utilization of containers (provided by NM in > heartbeat) and allocate speculative containers (at lower OS priority) if > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8767) TestStreamingStatus fails
[ https://issues.apache.org/jira/browse/YARN-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated YARN-8767: --- Attachment: YARN-8767.003.patch > TestStreamingStatus fails > - > > Key: YARN-8767 > URL: https://issues.apache.org/jira/browse/YARN-8767 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Andras Bokor >Assignee: Andras Bokor >Priority: Major > Attachments: YARN-8767.001.patch, YARN-8767.002.patch, > YARN-8767.003.patch > > > The test tries to connect to RM through 0.0.0.0:8032, but it cannot. > On the console I see the following error message: > {code}Your endpoint configuration is wrong; For more details see: > http://wiki.apache.org/hadoop/UnsetHostnameOrPort, while invoking > ApplicationClientProtocolPBClientImpl.getNewApplication over null after 1 > failover attempts. Trying to failover after sleeping for 44892ms.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619361#comment-16619361 ] Jon Bender commented on YARN-8786: -- Yep, YARN-8751 should definitely lessen the severity of the issue for us and I wouldn't be as concerned about this bug, I just wanted to file it with all the info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the coming weeks so I don't expect a backport will be necessary for our purposes anyway. > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the EXITED_WITH_FAILURE step (exit code 35). > We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, > the only major JIRAs that affected the executor since 3.0.0 seem unrelated > ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8] > and > [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56]) -- This message was
[jira] [Comment Edited] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619361#comment-16619361 ] Jon Bender edited comment on YARN-8786 at 9/18/18 4:28 PM: --- Yep, YARN-8751 should definitely lessen the severity of the issue for us and I wouldn't be as concerned about this bug, I just wanted to file it with all the info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the coming weeks so I don't expect a backport will be necessary for our purposes. was (Author: jonbender-stripe): Yep, YARN-8751 should definitely lessen the severity of the issue for us and I wouldn't be as concerned about this bug, I just wanted to file it with all the info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the coming weeks so I don't expect a backport will be necessary for our purposes anyway. > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the
[jira] [Created] (YARN-8789) Add BoundedQueue to AsyncDispatcher
BELUGA BEHR created YARN-8789: - Summary: Add BoundedQueue to AsyncDispatcher Key: YARN-8789 URL: https://issues.apache.org/jira/browse/YARN-8789 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR I recently came across a scenario where an MR ApplicationMaster was failing with an OOM exception. It had many thousands of Mappers and thousands of Reducers. It was noted that in the logging that the event-queue of {{AsyncDispatcher}} had a very large number of item in it and was seemingly never decreasing. I started looking at the code and thought it could use some clean up, simplification, and the ability to specify a bounded queue so that any incoming events are throttled until they can be processed. This will protect the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator
[ https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618519#comment-16618519 ] Bibin A Chundatt edited comment on YARN-7599 at 9/18/18 3:30 PM: - Thank you [~botong] for updated patch {quote} I should have excluded all apps that are still in YarnRM memory. That should eliminate the race condition you mentioned. What do you think? {quote} Yes , I agree with the same for fixing race. One concern is the response size. Since all applications are fetched from overall cluster. we could be fetching at worst case 5k x subclusters applications. was (Author: bibinchundatt): Thank you [~botong] for updated patch {quote} I should have excluded all apps that are still in YarnRM memory. That should eliminate the race condition you mentioned. What do you think? {quote} Yes , I aggree with the same. > [GPG] ApplicationCleaner in Global Policy Generator > --- > > Key: YARN-7599 > URL: https://issues.apache.org/jira/browse/YARN-7599 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Labels: federation, gpg > Attachments: YARN-7599-YARN-7402.v1.patch, > YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, > YARN-7599-YARN-7402.v4.patch > > > In Federation, we need a cleanup service for StateStore as well as Yarn > Registry. For the former, we need to remove old application records. For the > latter, failed and killed applications might leave records in the Yarn > Registry (see YARN-6128). We plan to do both cleanup work in > ApplicationCleaner in GPG -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8635) Container Resource localization fails if umask is 077
[ https://issues.apache.org/jira/browse/YARN-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619284#comment-16619284 ] Jason Lowe commented on YARN-8635: -- Thanks for the patch! It would be nice to have a short comment explaining why the umask setting is necessary, as it will not be obvious to many why it's there. There should also be a unit test so we don't accidentally regress this fix. It should be easy to extend the existing test_init_app test in test-container-executor. > Container Resource localization fails if umask is 077 > - > > Key: YARN-8635 > URL: https://issues.apache.org/jira/browse/YARN-8635 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8635-001.patch > > > {code} > java.io.IOException: Application application_1533652359071_0001 > initialization failed (exitCode=255) with output: main : command provided 0 > main : run as user is mapred > main : requested yarn user is mapred > Path > /opt/HA/OSBR310/nmlocal/usercache/mapred/appcache/application_1533652359071_0001 > has permission 700 but needs permission 750. > Did not create any app directories > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > ... 1 more > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > 2018-08-08 17:43:26,918 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_e04_1533652359071_0001_01_27 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2018-08-08 17:43:26,916 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_e04_1533652359071_0001_01_31 startLocalizer is : > 255 > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=255: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229) > Caused by: ExitCodeException exitCode=255: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) > at org.apache.hadoop.util.Shell.run(Shell.java:902) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > ... 2 more > 2018-08-08 17:43:26,923 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed for containe > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs
[ https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619248#comment-16619248 ] Eric Badger commented on YARN-8786: --- As [~shaneku...@gmail.com] said on the mailing list, this should be fixed in 3.1.2 by YARN-8751. The container-executor will continue to report the same error, but the Nodemanager will not mark itself as unhealthy if it receives an error code that corresponds to not being able to create directories. We could pull the patch back to 3.0.x if you think it's necessary. > LinuxContainerExecutor fails sporadically in create_local_dirs > -- > > Key: YARN-8786 > URL: https://issues.apache.org/jira/browse/YARN-8786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jon Bender >Priority: Major > > We started using CGroups with LinuxContainerExecutor recently, running Apache > Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn > container will fail with a message like the following: > {code:java} > [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: > Container container_1530684675517_516620_01_020846 transitioned from > SCHEDULED to RUNNING > [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO > monitor.ContainersMonitorImpl: Starting resource-monitoring for > container_1530684675517_516620_01_020846 > [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN > privileged.PrivilegedOperationExecutor: Shell execution returned exit code: > 35. Privileged Execution Operation Stderr: > [2018-09-02 23:48:02.506159] Could not create container dirsCould not create > local files and directories > [2018-09-02 23:48:02.506220] > [2018-09-02 23:48:02.506238] Stdout: main : command provided 1 > [2018-09-02 23:48:02.506258] main : run as user is nobody > [2018-09-02 23:48:02.506282] main : requested yarn user is root > [2018-09-02 23:48:02.506294] Getting exit code file... > [2018-09-02 23:48:02.506307] Creating script paths... > [2018-09-02 23:48:02.506330] Writing pid file... > [2018-09-02 23:48:02.506366] Writing to tmp file > /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp > [2018-09-02 23:48:02.506389] Writing to cgroup task files... > [2018-09-02 23:48:02.506402] Creating local dirs... > [2018-09-02 23:48:02.506414] Getting exit code file... > [2018-09-02 23:48:02.506435] Creating script paths... > {code} > Looking at the container executor source it's traceable to errors here: > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604] > And ultimately to > [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672] > The root failure seems to be in the underlying mkdir call, but that exit code > / errno is swallowed so we don't have more details. We tend to see this when > many containers start at the same time for the same application on a host, > and suspect it may be related to some race conditions around those shared > directories between containers for the same application. > For example, this is a typical pattern in the audit logs: > {code:java} > [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO > nodemanager.NMAuditLogger: USER=root IP=<> Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012870 > [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN > nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - > Failed TARGET=ContainerImplRESULT=FAILURE DESCRIPTION=Container failed > with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126 > CONTAINERID=container_1530684675517_559126_01_012871 > {code} > Two containers for the same application starting in quick succession followed > by the EXITED_WITH_FAILURE step (exit code 35). > We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, > the only major JIRAs that affected the executor since 3.0.0 seem unrelated > ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8] > and >
[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619232#comment-16619232 ] Antal Bálint Steinbach edited comment on YARN-8553 at 9/18/18 2:59 PM: --- Hi [~snemeth] , Thanks for creating the patch. Your changes cleaning the code for sure. I found some minor things to consider: 1) I would set the boolean fields default value in _ApplicationsRequestBuilderCommons_ to false. It is more straightforward in that way. {code:java} private boolean startedTimeSpecified; private boolean finishedTimeSpecified; private boolean appStatesSpecified; private boolean appTypesSpecified;{code} 2) in _ApplicationsRequestBuilderCommons_ {code:java} Set appStates = ApplicationsRequestValueParser .parseApplicationStates(this.appStates); {code} appStates is already a field in the class. 3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and read if you don't check all the "builder setters" one by one. In my opinion, it is enough to keep the one which tests all of them together, plus the ones which are testing edge cases or invalid values. was (Author: bsteinbach): Hi [~snemeth] , Thanks for creating the patch. Your changes cleaning the code for sure. I found some minor things to consider: 1) I would set the boolean fields default value in _ApplicationsRequestBuilderCommons_ to false. It is more straightforward in that way. {code:java} private boolean startedTimeSpecified; private boolean finishedTimeSpecified; private boolean appStatesSpecified; private boolean appTypesSpecified;{code} 2) in _ApplicationsRequestBuilderCommons_ {code:java} // Set appStates = ApplicationsRequestValueParser .parseApplicationStates(this.appStates); {code} appStates is already a field in the class. 3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and read if you don't check all the "builder setters" one by one. In my opinion, it is enough to keep the one which tests all of them together, plus the ones which are testing edge cases or invalid values. > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch, YARN-8553.001.patch, > YARN-8553.002.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619232#comment-16619232 ] Antal Bálint Steinbach commented on YARN-8553: -- Hi [~snemeth] , Thanks for creating the patch. Your changes cleaning the code for sure. I found some minor things to consider: 1) I would set the boolean fields default value in _ApplicationsRequestBuilderCommons_ to false. It is more straightforward in that way. {code:java} private boolean startedTimeSpecified; private boolean finishedTimeSpecified; private boolean appStatesSpecified; private boolean appTypesSpecified;{code} 2) in _ApplicationsRequestBuilderCommons_ {code:java} // Set appStates = ApplicationsRequestValueParser .parseApplicationStates(this.appStates); {code} appStates is already a field in the class. 3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and read if you don't check all the "builder setters" one by one. In my opinion, it is enough to keep the one which tests all of them together, plus the ones which are testing edge cases or invalid values. > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch, YARN-8553.001.patch, > YARN-8553.002.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619208#comment-16619208 ] Weiwei Yang commented on YARN-8771: --- OK, if that's the case, I am fine with the patch. LGTM, +1, if we don't get any further comments, I will commit the patch by tomorrow. Thanks [~Tao Yang]. > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type
[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619181#comment-16619181 ] Tao Yang commented on YARN-8771: Thanks [~cheersyang] for the review. {quote} Instead of adding a new "resource-types-1.xml", can we use TestResourceUtils#addNewTypesToResources for the tests? I think it doesn't matter to test with existing gpu or fpga resource correct? {quote} IIUC, MockRM will reset resource types and reload resource-types.xml internally, without resource-types.xml, MockRM will only have two resource types (memory and vcores), so that we can't simulate that cluster contains empty resource type. Thoughts? > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > // How much need to unreserve equals to: > // max(required - headroom, amountNeedUnreserve) > Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); > Resource resourceNeedToUnReserve = > Resources.max(rc, clusterResource, > Resources.subtract(capability, headRoom), > currentResoureLimits.getAmountNeededUnreserve()); > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when > {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, > needToUnreserve which is the result of {{Resources#greaterThan}} will be > {{false}}. This is not reasonable because required resource did exceed the > headroom and unreserve is needed. > After that, when reaching the unreserve process in > RegularContainerAllocator#assignContainer, unreserve process will be skipped > when shouldAllocOrReserveNewContainer is true (when required containers > > reserved containers) and needToUnreserve is wrongly calculated to be false: > {code:java} > if (availableContainers > 0) { > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > // unreserve process can be wrongly skipped when > shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required > resource did exceed the headroom > if (!shouldAllocOrReserveNewContainer || needToUnreserve) { > ... > } > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619110#comment-16619110 ] Hadoop QA commented on YARN-8468: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 17 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 18 new + 936 unchanged - 24 fixed = 954 total (was 960) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 15s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8468 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940195/YARN-8468.013.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs
[jira] [Commented] (YARN-8750) Refactor TestQueueMetrics
[ https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619061#comment-16619061 ] Zoltan Siegl commented on YARN-8750: Thanks for the patch, +1 LGTM (non-binding) > Refactor TestQueueMetrics > - > > Key: YARN-8750 > URL: https://issues.apache.org/jira/browse/YARN-8750 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8750.001.patch, YARN-8750.002.patch, > YARN-8750.003.patch > > > {{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 > and 14 parameters, respectively. > It is very hard to read the testcases that are using these methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8783) Improve the documentation for the docker.trusted.registries configuration
[ https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8783: -- Summary: Improve the documentation for the docker.trusted.registries configuration (was: Property docker.trusted.registries does not work when using a list) > Improve the documentation for the docker.trusted.registries configuration > - > > Key: YARN-8783 > URL: https://issues.apache.org/jira/browse/YARN-8783 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Simon Prewo >Priority: Major > Labels: Docker, container-executor, docker > > I am deploying the default yarn distributed shell example: > {code:java} > yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar > hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Having a *single trusted registry configured like this works*: > {code:java} > docker.trusted.registries=centos{code} > But having *a list of trusted registries configured fails* ("Shell error > output: image: centos is not trusted."): > {code:java} > docker.trusted.registries=centos,ubuntu{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618935#comment-16618935 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~cheersyang] , Thanks for the feedback. I rebased the patch to the current trunk and merged the conflicts. 1) Fixed a lot of related checkstyle issues 2) Added a null check for appAttempt, queueName can be null. If it is null than _scheduler.getMaximumResourceCapability()_ will be called by default in the schedulers. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8468: - Attachment: YARN-8468.013.patch > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8749) Restrict job submission to queue based on apptype
[ https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksandr Shevchenko updated YARN-8749: --- Attachment: YARN-8749.001.patch > Restrict job submission to queue based on apptype > - > > Key: YARN-8749 > URL: https://issues.apache.org/jira/browse/YARN-8749 > Project: Hadoop YARN > Issue Type: New Feature > Components: RM, scheduler >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Minor > Attachments: YARN-8749.001.patch > > > The proposed possibility here is adding a new property for queue tuning to > allow submit an application to queue only with the allowed types. If an > application has a different type from queue allowed types, the application > should be rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8749) Restrict job submission to queue based on apptype
[ https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618929#comment-16618929 ] Oleksandr Shevchenko commented on YARN-8749: Attached patch to demonstrate the approach. [^YARN-8749.001.patch] Could someone evaluate this feature and approach? Thanks. > Restrict job submission to queue based on apptype > - > > Key: YARN-8749 > URL: https://issues.apache.org/jira/browse/YARN-8749 > Project: Hadoop YARN > Issue Type: New Feature > Components: RM, scheduler >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Minor > Attachments: YARN-8749.001.patch > > > The proposed possibility here is adding a new property for queue tuning to > allow submit an application to queue only with the allowed types. If an > application has a different type from queue allowed types, the application > should be rejected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
[ https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618910#comment-16618910 ] Hadoop QA commented on YARN-8775: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 7s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8775 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940176/YARN-8775.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 038b6e61f4b1 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f796cfd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | Test Results |
[jira] [Comment Edited] (YARN-8783) Property docker.trusted.registries does not work when using a list
[ https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872 ] Simon Prewo edited comment on YARN-8783 at 9/18/18 10:35 AM: - [~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support. [~shaneku...@gmail.com]: I tried the library-approach and it works. However, I stayed with tagging for now. Let me summarize for others (findings this on Google) what helped: 1) Pull centos image and tag it: {code:java} docker pull centos && docker tag centos local/centos{code} 2) Add _local_ repository to docker.trusted.registries in container-executor.cfg {code:java} [docker] ... docker.trusted.registries=local {code} 3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case local/centos). I.e. execute distributed shell like this: {code:java} yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos -shell_command "sleep 90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code} was (Author: simonprewo): [~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support. [~shaneku...@gmail.com]: I tried the library-approach and it works. However, I stayed with tagging for now. Let me summarize for others (findings this on Google) what helped: 1) Pull centos image and tag it: {code:java} docker pull centos && docker tag centos local/centos:latest{code} 2) Add _local_ repository to docker.trusted.registries in container-executor.cfg {code:java} [docker] ... docker.trusted.registries=local {code} 3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case local/centos:latest). I.e. execute distributed shell like this: {code:java} yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code} > Property docker.trusted.registries does not work when using a list > -- > > Key: YARN-8783 > URL: https://issues.apache.org/jira/browse/YARN-8783 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Simon Prewo >Priority: Major > Labels: Docker, container-executor, docker > > I am deploying the default yarn distributed shell example: > {code:java} > yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar > hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Having a *single trusted registry configured like this works*: > {code:java} > docker.trusted.registries=centos{code} > But having *a list of trusted registries configured fails* ("Shell error > output: image: centos is not trusted."): > {code:java} > docker.trusted.registries=centos,ubuntu{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources
[ https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618876#comment-16618876 ] Jinjiang Ling commented on YARN-8653: - Hi [~haibochen], could you help to review this ? > Wrong display of resources when cluster resources are less than min resources > - > > Key: YARN-8653 > URL: https://issues.apache.org/jira/browse/YARN-8653 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Major > Attachments: YARN-8653.001.patch, YARN-8653.001.patch, > wrong_resource_in_fairscheduler.JPG > > > If the cluster resources are less the min resources of Fair Scheduler, a > display error will happened like this. > > !wrong_resource_in_fairscheduler.JPG! > In this case, I config my queue with max resource to 48 vcores, 49152 MB and > min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 > vcores and 24576 MB. Then the max resource are fixed to the cluster > resources, but the min resources and steady fair share are still the config > value. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8783) Property docker.trusted.registries does not work when using a list
[ https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872 ] Simon Prewo edited comment on YARN-8783 at 9/18/18 10:26 AM: - [~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support. [~shaneku...@gmail.com]: I tried the library-approach and it works. However, I stayed with tagging for now. Let me summarize for others (findings this on Google) what helped: 1) Pull centos image and tag it: {code:java} docker pull centos && docker tag centos local/centos:latest{code} 2) Add _local_ repository to docker.trusted.registries in container-executor.cfg {code:java} [docker] ... docker.trusted.registries=local {code} 3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case local/centos:latest). I.e. execute distributed shell like this: {code:java} yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code} was (Author: simonprewo): [~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support. Let me summarize for others (findings this on Google) what helped: 1) Pull centos image and tag it: {code:java} docker pull centos && docker tag centos local/centos:latest{code} 2) Add _local_ repository to docker.trusted.registries in container-executor.cfg {code:java} [docker] ... docker.trusted.registries=local {code} 3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case local/centos:latest). I.e. execute distributed shell like this: {code:java} yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Property docker.trusted.registries does not work when using a list > -- > > Key: YARN-8783 > URL: https://issues.apache.org/jira/browse/YARN-8783 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Simon Prewo >Priority: Major > Labels: Docker, container-executor, docker > > I am deploying the default yarn distributed shell example: > {code:java} > yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar > hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Having a *single trusted registry configured like this works*: > {code:java} > docker.trusted.registries=centos{code} > But having *a list of trusted registries configured fails* ("Shell error > output: image: centos is not trusted."): > {code:java} > docker.trusted.registries=centos,ubuntu{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8783) Property docker.trusted.registries does not work when using a list
[ https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872 ] Simon Prewo commented on YARN-8783: --- [~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support. Let me summarize for others (findings this on Google) what helped: 1) Pull centos image and tag it: {code:java} docker pull centos && docker tag centos local/centos:latest{code} 2) Add _local_ repository to docker.trusted.registries in container-executor.cfg {code:java} [docker] ... docker.trusted.registries=local {code} 3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case local/centos:latest). I.e. execute distributed shell like this: {code:java} yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Property docker.trusted.registries does not work when using a list > -- > > Key: YARN-8783 > URL: https://issues.apache.org/jira/browse/YARN-8783 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Simon Prewo >Priority: Major > Labels: Docker, container-executor, docker > > I am deploying the default yarn distributed shell example: > {code:java} > yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar > hadoop-yarn-applications-distributedshell.jar -num_containers 1{code} > Having a *single trusted registry configured like this works*: > {code:java} > docker.trusted.registries=centos{code} > But having *a list of trusted registries configured fails* ("Shell error > output: image: centos is not trusted."): > {code:java} > docker.trusted.registries=centos,ubuntu{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints documentation
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618810#comment-16618810 ] Hudson commented on YARN-8787: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14992 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14992/]) YARN-8787. Fix broken list items in PlacementConstraints documentation. (wwei: rev 78a0d173e4f0c2f2679a04edd62a60fb76dde4f0) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm > Fix broken list items in PlacementConstraints documentation > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
[ https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8775: - Attachment: YARN-8775.002.patch > TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File > modifications > -- > > Key: YARN-8775 > URL: https://issues.apache.org/jira/browse/YARN-8775 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 3.0.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-8775.001.patch, YARN-8775.002.patch > > > The test can fail sometimes when file operations were done during the check > done by the thread in _LocalDirsHandlerService._ > {code:java} > java.lang.AssertionError: NodeManager could not identify disk failure. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99) > Stderr > 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > to fail. > 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > to fail. > 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] > nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - > Directory > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > error, Not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1, > removing from list of valid directories > 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] > localizer.ResourceLocalizationService > (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not > initialize log dir > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > java.io.FileNotFoundException: Destination exists and is not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > at > org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515) > at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496) > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269) > at >
[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
[ https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618803#comment-16618803 ] Antal Bálint Steinbach commented on YARN-8775: -- Hi [~snemeth] , Thanks for the comments. All issues fixed. > TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File > modifications > -- > > Key: YARN-8775 > URL: https://issues.apache.org/jira/browse/YARN-8775 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 3.0.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-8775.001.patch, YARN-8775.002.patch > > > The test can fail sometimes when file operations were done during the check > done by the thread in _LocalDirsHandlerService._ > {code:java} > java.lang.AssertionError: NodeManager could not identify disk failure. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99) > Stderr > 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > to fail. > 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > to fail. > 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] > nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - > Directory > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > error, Not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1, > removing from list of valid directories > 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] > localizer.ResourceLocalizationService > (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not > initialize log dir > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > java.io.FileNotFoundException: Destination exists and is not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > at > org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515) > at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496) > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269) > at >
[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources
[ https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618786#comment-16618786 ] Weiwei Yang commented on YARN-8653: --- Hi [~lingjinjiang] I am not familiar with fair scheduler code, ping [~haibochen], he should be able to help to review this. Thanks > Wrong display of resources when cluster resources are less than min resources > - > > Key: YARN-8653 > URL: https://issues.apache.org/jira/browse/YARN-8653 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Major > Attachments: YARN-8653.001.patch, YARN-8653.001.patch, > wrong_resource_in_fairscheduler.JPG > > > If the cluster resources are less the min resources of Fair Scheduler, a > display error will happened like this. > > !wrong_resource_in_fairscheduler.JPG! > In this case, I config my queue with max resource to 48 vcores, 49152 MB and > min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 > vcores and 24576 MB. Then the max resource are fixed to the cluster > resources, but the min resources and steady fair share are still the config > value. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints documentation
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618781#comment-16618781 ] Masahiro Tanaka commented on YARN-8787: --- Thank you for re-uploading and reviewing [~ajisakaa] ! Thank you for committing [~cheersyang] ! > Fix broken list items in PlacementConstraints documentation > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources
[ https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618768#comment-16618768 ] Jinjiang Ling commented on YARN-8653: - Hi [~cheersyang], could you help to review this ? > Wrong display of resources when cluster resources are less than min resources > - > > Key: YARN-8653 > URL: https://issues.apache.org/jira/browse/YARN-8653 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Jinjiang Ling >Assignee: Jinjiang Ling >Priority: Major > Attachments: YARN-8653.001.patch, YARN-8653.001.patch, > wrong_resource_in_fairscheduler.JPG > > > If the cluster resources are less the min resources of Fair Scheduler, a > display error will happened like this. > > !wrong_resource_in_fairscheduler.JPG! > In this case, I config my queue with max resource to 48 vcores, 49152 MB and > min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 > vcores and 24576 MB. Then the max resource are fixed to the cluster > resources, but the min resources and steady fair share are still the config > value. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8787) Fix broken list items in PlacementConstraints documentation
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8787: -- Summary: Fix broken list items in PlacementConstraints documentation (was: Fix broken list items in PlacementConstraints.md.vm) > Fix broken list items in PlacementConstraints documentation > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618746#comment-16618746 ] Weiwei Yang commented on YARN-8787: --- +1 as well, thanks [~masatana], [~ajisakaa], I'll help to commit this shortly. > Fix broken list items in PlacementConstraints.md.vm > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618720#comment-16618720 ] Akira Ajisaka commented on YARN-8787: - +1, thanks [~masatana]. > Fix broken list items in PlacementConstraints.md.vm > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm
[ https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618633#comment-16618633 ] Hadoop QA commented on YARN-8787: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 28m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8787 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940144/YARN-8787.0.patch | | Optional Tests | dupname asflicense mvnsite | | uname | Linux b512f64ce90c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbeca01 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 440 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21860/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Fix broken list items in PlacementConstraints.md.vm > --- > > Key: YARN-8787 > URL: https://issues.apache.org/jira/browse/YARN-8787 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 3.1.1 >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, > listitems1.PNG > > > It looks like some parts of the document below should be list items. > https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html > It might be because of missing newlines before listing. > https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8652) [UI2] YARN UI2 breaks if getUserInfo REST API is not available in older versions.
[ https://issues.apache.org/jira/browse/YARN-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618625#comment-16618625 ] Hudson commented on YARN-8652: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14989 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14989/]) YARN-8652. [UI2] YARN UI2 breaks if getUserInfo REST API is not (sunilg: rev bbeca0107e247ae14cfe96761f9e5fbb1f02e53d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/application.js * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/application.js > [UI2] YARN UI2 breaks if getUserInfo REST API is not available in older > versions. > - > > Key: YARN-8652 > URL: https://issues.apache.org/jira/browse/YARN-8652 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8652.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8726) [UI2] YARN UI2 is not accessible when config.env file failed to load
[ https://issues.apache.org/jira/browse/YARN-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618600#comment-16618600 ] Hudson commented on YARN-8726: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14988 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14988/]) YARN-8726. [UI2] YARN UI2 is not accessible when config.env file failed (sunilg: rev 0cc6e039454127984a3aa5b2ba5d9151e4a72dd4) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/initializers/loader.js > [UI2] YARN UI2 is not accessible when config.env file failed to load > > > Key: YARN-8726 > URL: https://issues.apache.org/jira/browse/YARN-8726 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8726.001.patch > > > It is observed that yarn UI2 is not accessible. When UI2 is inspected, it > gives below error > {code:java} > index.html:1 Refused to execute script from > 'http://ctr-e138-1518143905142-456429-01-05.hwx.site:8088/ui2/config/configs.env' > because its MIME type ('text/plain') is not executable, and strict MIME type > checking is enabled. > yarn-ui.js:219 base url: > vendor.js:1978 ReferenceError: ENV is not defined > at updateConfigs (yarn-ui.js:212) > at Object.initialize (yarn-ui.js:218) > at vendor.js:824 > at vendor.js:825 > at visit (vendor.js:3025) > at Object.visit [as default] (vendor.js:3024) > at DAG.topsort (vendor.js:750) > at Class._runInitializer (vendor.js:825) > at Class.runInitializers (vendor.js:824) > at Class._bootSync (vendor.js:823) > onerrorDefault @ vendor.js:1978 > trigger @ vendor.js:2967 > (anonymous) @ vendor.js:3006 > invoke @ vendor.js:626 > flush @ vendor.js:629 > flush @ vendor.js:619 > end @ vendor.js:642 > run @ vendor.js:648 > join @ vendor.js:648 > run.join @ vendor.js:1510 > (anonymous) @ vendor.js:1512 > fire @ vendor.js:230 > fireWith @ vendor.js:235 > ready @ vendor.js:242 > completed @ vendor.js:242 > vendor.js:823 Uncaught ReferenceError: ENV is not defined > at updateConfigs (yarn-ui.js:212) > at Object.initialize (yarn-ui.js:218) > at vendor.js:824 > at vendor.js:825 > at visit (vendor.js:3025) > at Object.visit [as default] (vendor.js:3024) > at DAG.topsort (vendor.js:750) > at Class._runInitializer (vendor.js:825) > at Class.runInitializers (vendor.js:824) > at Class._bootSync (vendor.js:823) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart
[ https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618592#comment-16618592 ] Sunil Govindan commented on YARN-8753: -- This needs a rebase to trunk. [~yeshavora] pls help > [UI2] Lost nodes representation missing from Nodemanagers Chart > --- > > Key: YARN-8753 > URL: https://issues.apache.org/jira/browse/YARN-8753 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot > 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, > YARN-8753.001.patch > > > Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status > page. > This chart does not show nodemanagers if they are LOST. > Due to this issue, Node information page and Node status page shows different > node managers count. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org