[jira] [Issue Comment Deleted] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wanqiang Ji updated YARN-9401: -- Comment: was deleted (was: Hi, [~eyang]. Can you help to review this?) > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9366) Make logs in TimelineClient implementation specific to application
[ https://issues.apache.org/jira/browse/YARN-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803598#comment-16803598 ] Prabha Manepalli edited comment on YARN-9366 at 3/28/19 5:24 AM: - Thanks for the finding [~abmodi]. I missed adding the TimelineEntities file to the patch. I will add it and upload new patch again. was (Author: prabham): Thanks for the finding [~abmodi]. I missed adding the TimelineEntities file to the patch file. I will add it and upload new patch again. > Make logs in TimelineClient implementation specific to application > --- > > Key: YARN-9366 > URL: https://issues.apache.org/jira/browse/YARN-9366 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Minor > Attachments: YARN-9366.v1.patch > > > For every container launched on a NM node, a timeline client is created to > publish entities to the corresponding application's timeline collector. And > there would be multiple timeline clients running at the same time. Current > implementation of timeline client logs are insufficient to isolate publishing > problems related to one application. Hence, creating this Jira to improvise > the logs in TimelineV2ClientImpl. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9366) Make logs in TimelineClient implementation specific to application
[ https://issues.apache.org/jira/browse/YARN-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803598#comment-16803598 ] Prabha Manepalli commented on YARN-9366: Thanks for the finding [~abmodi]. I missed adding the TimelineEntities file to the patch file. I will add it and upload new patch again. > Make logs in TimelineClient implementation specific to application > --- > > Key: YARN-9366 > URL: https://issues.apache.org/jira/browse/YARN-9366 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Minor > Attachments: YARN-9366.v1.patch > > > For every container launched on a NM node, a timeline client is created to > publish entities to the corresponding application's timeline collector. And > there would be multiple timeline clients running at the same time. Current > implementation of timeline client logs are insufficient to isolate publishing > problems related to one application. Hence, creating this Jira to improvise > the logs in TimelineV2ClientImpl. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803574#comment-16803574 ] Eric Yang commented on YARN-9292: - {quote}For images, we probably need to write command file to a path independent of containers under nmPrivate directory. Our code can ensure that once the command is executed, the temp .cmd file is deleted. I do think it is important that we don't expose this API with container/container id in it because there is no logical relation between the image and the container.{quote} The cmd file is placed in application directory, and by deleting application directory by the current logic. There is no additional code to be written for clean up. The side benefit is that caller needs to know the running application ID to generate a container id that can call docker images command. This makes it more difficult for external party without running an app to get to docker image command. The current code reduces exposure of docker images command to unauthorized user, and less likely to open security hole in the flow for PrivilegedOperation/Container-Executor initializing secure directory, and clean up. > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803551#comment-16803551 ] Chandni Singh commented on YARN-9292: - {quote} Real container id of the application master provides the already initialized path and .cmd file is stored in existing container directory. cmd file gets clean up when application is finished. Using randomly generated container id will not clean up as nicely. {quote} [~eyang] In patch 6, a random container id is already being created on the client side which is the {{ServiceScheduler}}. It is creating a container id from the appId and the current system time. {code} + ContainerId cid = ContainerId + .newContainerId(ApplicationAttemptId.newInstance(appId, 1), + System.currentTimeMillis()); {code} For images, we probably need to write command file to a path independent of containers under nmPrivate directory. Our code can ensure that once the command is executed, the temp .cmd file is deleted. I do think it is important that we don't expose this API with container/container id in it because there is no logical relation between the image and the container. > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803546#comment-16803546 ] Jonathan Hung commented on YARN-8200: - * TestSecureLogins failure related to HADOOP-16031 * [TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart|https://builds.apache.org/job/PreCommit-YARN-Build/23820/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestOpportunisticContainerAllocatorAMService/testContainerPromoteAndDemoteBeforeContainerStart/] related to YARN-8011 * TestOpportunisticContainerAllocatorAMService.testAppAttemptRemovalAfterNodeRemoval is already failing in branch-3.0 * TestNodeLabelContainerAllocation related to YARN-9006 > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803534#comment-16803534 ] Hadoop QA commented on YARN-8200: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 26 new or modified test files. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 23s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 38s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 34s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 24s{color} | {color:green} branch-3.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 9s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 41s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 124 new + 1302 unchanged - 27 fixed = 1426 total (was 1329) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 12s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 524 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 4s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} |
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803524#comment-16803524 ] Eric Yang commented on YARN-9292: - [~csingh] Thanks for the review. I tried using randomly generated container id but the nmPrivate directory needs to be initialized and tracked separately. Real container id of the application master provides the already initialized path and .cmd file is stored in existing container directory. cmd file gets clean up when application is finished. Using randomly generated container id will not clean up as nicely. I will make the logging change and add a new test for ServiceScheduler in the next patch. Thanks > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803519#comment-16803519 ] Jonathan Hung commented on YARN-8200: - Attached YARN-8200-branch-2.002 containing all the commits targeted for branch-2. (The commit list is at YARN-8200 branch which has been rebased on latest branch-2). > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8200: Attachment: YARN-8200-branch-2.002.patch > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6063) Inline java doc with Hadoop formatter
[ https://issues.apache.org/jira/browse/YARN-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WEI-HSIAO-LEE reassigned YARN-6063: --- Assignee: WEI-HSIAO-LEE > Inline java doc with Hadoop formatter > - > > Key: YARN-6063 > URL: https://issues.apache.org/jira/browse/YARN-6063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: WEI-HSIAO-LEE >Priority: Trivial > Labels: newbie > > I see that most of the classes in TimelineReader java doc does not meet > Hadoop formatters. > This causes every time patch preparation need an extra attention on top of > real fix patches. > It is better to be source code is inline with Hadoop formatters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9412) Backport YARN-6909 to branch-2
[ https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803465#comment-16803465 ] Anthony Hsu commented on YARN-9412: --- Awesome! > Backport YARN-6909 to branch-2 > -- > > Key: YARN-9412 > URL: https://issues.apache.org/jira/browse/YARN-9412 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag
[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803429#comment-16803429 ] Chandni Singh commented on YARN-9292: - [~eyang] The rest API added here to find the image is independent of any container. So I don't think we should have the container and container id in the path. {code} @Path("/container/{id}/docker/images/{name}") {code} If this is done because the DockerCommandExecutor needs a container id, we could change the implementation here to use a dummy container id. This implementation couldd be fixed later but the rest API will not be affected and will remain unchanged.. {code} String output = DockerCommandExecutor.executeDockerCommand( dockerImagesCommand, id, null, privOpExecutor, false, nmContext); {code} We could generate a dummy container id here instead of doing it in every client. Some other nitpicks: 1. Log statements in ServiceScheduler can be parameterized which improves readability. {code} LOG.info("Docker image: " + id + " maps to: " + imageId); -> LOG.info("Docker image: {} maps to : {}", id, imageId); {code} 2. There aren't any tests for the new code added to {{ServiceScheduler}}. Will it be possible to add one? > Implement logic to keep docker image consistent in application that uses > :latest tag > > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold
[ https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803416#comment-16803416 ] Eric Yang commented on YARN-9421: - There is a few corner cases to consider. If the size of the YARN cluster changes frequently, safe mode mechanism might kick in at random time? If the jobs are queued during safe mode, job queue tracking also increase memory usage of resource manager. At some point, the queue size will be full because there is finite amount of tracking memory for resource manager. What happen if job queue length is full, and what happens if jobs take too long to start and missed SLA? If job queue is full, and it falls back to the same type of error messages for showing resource unavailable. It might be better to let client side retry decision kicking sooner rather than queuing and found out queue is full later. Option 2 is a option to mask transient problem, but retry logic still depends on the client to make the right decision. I think the default behavior does not need to change for production cluster, but option 2 is nice to have for improving user experience for testing cluster. > Implement SafeMode for ResourceManager by defining a resource threshold > --- > > Key: YARN-9421 > URL: https://issues.apache.org/jira/browse/YARN-9421 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Szilard Nemeth >Priority: Major > Attachments: client-log.log, nodemanager.log, resourcemanager.log > > > We have a hypothetical testcase in our test suite that tests Resource Types. > The test does the following: > 1. Sets up a resource named "gpu" > 2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu". > 3. It executes a sleep job with resoure requests: > "-Dmapreduce.reduce.resource.gpu=7" and > "-Dyarn.app.mapreduce.am.resource.gpu=11" > Sometimes, we encounter situations when the app submission fails with: > {code:java} > 2019-02-25 06:09:56,795 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission > failed in validating AM resource request for application > application_1551103768202_0001 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[gpu], Requested > resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation={code} > It's clearly visible that the maximum allowed allocation does not have any > "gpu" resources. > > Looking into the logs further, I realized that sometimes the node having the > "gpu" resources are registered after the app is submitted. > In a real world situation and even with this very special test exexution, we > can't be sure which order NMs are registering with RM. > With the advent of resource types, this issue was more likely surface. > If we have a cluster with some "rare" resources like GPUs only on some nodes > out of a 100, we can quickly run into a situation when the NMs with GPUs are > registering later than the normal nodes. While the critical NMs are still > registering, we will most likely experience the same > InvalidResourceRequestException if we submit jobs requesting GPUs. > There is a naive solution to this: > 1. Give some time for RM to wait for NMs to be able to register themselves > and put submitted applications on hold. This could work in some situations > but it's not the most flexible solution as different clusters can have > different requirements. Of course, we can make this more flexible by making > the timeout value configurable. > *A more flexible alternative would be:* > 2. We define a threshold of Resource capability: While we haven't reached > this threshold, we put submitted jobs on hold. Once we reached the threshold, > we enable jobs to pass through. > This is very similar to an already existing concept, the SafeMode in HDFS > ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]). > Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 > GPUs. > Defining a threshold like this, we can ensure most of the submitted jobs > won't be lost, just "parked" until NMs are registered. > The final solution could be the Resource threshold, or the combination of the > threshold and timeout value. I'm open for any other suggestion as well. > *Last but not least, a very easy way to reproduce the issue on a 3 node > cluster:* > 1. Configure a resource type, named 'testres'. > 2. Node1 runs RM, Node 2/3 runs NMs > 3.
[jira] [Updated] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold
[ https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9421: - Attachment: resourcemanager.log client-log.log nodemanager.log > Implement SafeMode for ResourceManager by defining a resource threshold > --- > > Key: YARN-9421 > URL: https://issues.apache.org/jira/browse/YARN-9421 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Szilard Nemeth >Priority: Major > Attachments: client-log.log, nodemanager.log, resourcemanager.log > > > We have a hypothetical testcase in our test suite that tests Resource Types. > The test does the following: > 1. Sets up a resource named "gpu" > 2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu". > 3. It executes a sleep job with resoure requests: > "-Dmapreduce.reduce.resource.gpu=7" and > "-Dyarn.app.mapreduce.am.resource.gpu=11" > Sometimes, we encounter situations when the app submission fails with: > {code:java} > 2019-02-25 06:09:56,795 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission > failed in validating AM resource request for application > application_1551103768202_0001 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[gpu], Requested > resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation={code} > It's clearly visible that the maximum allowed allocation does not have any > "gpu" resources. > > Looking into the logs further, I realized that sometimes the node having the > "gpu" resources are registered after the app is submitted. > In a real world situation and even with this very special test exexution, we > can't be sure which order NMs are registering with RM. > With the advent of resource types, this issue was more likely surface. > If we have a cluster with some "rare" resources like GPUs only on some nodes > out of a 100, we can quickly run into a situation when the NMs with GPUs are > registering later than the normal nodes. While the critical NMs are still > registering, we will most likely experience the same > InvalidResourceRequestException if we submit jobs requesting GPUs. > There is a naive solution to this: > 1. Give some time for RM to wait for NMs to be able to register themselves > and put submitted applications on hold. This could work in some situations > but it's not the most flexible solution as different clusters can have > different requirements. Of course, we can make this more flexible by making > the timeout value configurable. > *A more flexible alternative would be:* > 2. We define a threshold of Resource capability: While we haven't reached > this threshold, we put submitted jobs on hold. Once we reached the threshold, > we enable jobs to pass through. > This is very similar to an already existing concept, the SafeMode in HDFS > ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]). > Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 > GPUs. > Defining a threshold like this, we can ensure most of the submitted jobs > won't be lost, just "parked" until NMs are registered. > The final solution could be the Resource threshold, or the combination of the > threshold and timeout value. I'm open for any other suggestion as well. > *Last but not least, a very easy way to reproduce the issue on a 3 node > cluster:* > 1. Configure a resource type, named 'testres'. > 2. Node1 runs RM, Node 2/3 runs NMs > 3. Node2 has 1 testres > 4. Node3 has 0 testres > 5. Stop all nodes > 6. Start RM on Node1 > 7. Start NM on Node3 (the one without the resource) > 8. Start a pi job, request 1 testres for the AM > Here's the command to start the job: > {code:java} > MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code} > > *Configurations*: > node1: yarn-site.xml of ResourceManager: > {code:java} > > yarn.resource-types > testres > {code} > node2: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > > > yarn.nodemanager.resource-type.testres > 1 > {code} > node3: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > {code} > Please see full process logs from RM, NM, YARN-client attached. -- This message was sent
[jira] [Assigned] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold
[ https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9421: Assignee: (was: Szilard Nemeth) > Implement SafeMode for ResourceManager by defining a resource threshold > --- > > Key: YARN-9421 > URL: https://issues.apache.org/jira/browse/YARN-9421 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Szilard Nemeth >Priority: Major > > We have a hypothetical testcase in our test suite that tests Resource Types. > The test does the following: > 1. Sets up a resource named "gpu" > 2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu". > 3. It executes a sleep job with resoure requests: > "-Dmapreduce.reduce.resource.gpu=7" and > "-Dyarn.app.mapreduce.am.resource.gpu=11" > Sometimes, we encounter situations when the app submission fails with: > {code:java} > 2019-02-25 06:09:56,795 WARN > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission > failed in validating AM resource request for application > application_1551103768202_0001 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[gpu], Requested > resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation={code} > It's clearly visible that the maximum allowed allocation does not have any > "gpu" resources. > > Looking into the logs further, I realized that sometimes the node having the > "gpu" resources are registered after the app is submitted. > In a real world situation and even with this very special test exexution, we > can't be sure which order NMs are registering with RM. > With the advent of resource types, this issue was more likely surface. > If we have a cluster with some "rare" resources like GPUs only on some nodes > out of a 100, we can quickly run into a situation when the NMs with GPUs are > registering later than the normal nodes. While the critical NMs are still > registering, we will most likely experience the same > InvalidResourceRequestException if we submit jobs requesting GPUs. > There is a naive solution to this: > 1. Give some time for RM to wait for NMs to be able to register themselves > and put submitted applications on hold. This could work in some situations > but it's not the most flexible solution as different clusters can have > different requirements. Of course, we can make this more flexible by making > the timeout value configurable. > *A more flexible alternative would be:* > 2. We define a threshold of Resource capability: While we haven't reached > this threshold, we put submitted jobs on hold. Once we reached the threshold, > we enable jobs to pass through. > This is very similar to an already existing concept, the SafeMode in HDFS > ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]). > Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 > GPUs. > Defining a threshold like this, we can ensure most of the submitted jobs > won't be lost, just "parked" until NMs are registered. > The final solution could be the Resource threshold, or the combination of the > threshold and timeout value. I'm open for any other suggestion as well. > *Last but not least, a very easy way to reproduce the issue on a 3 node > cluster:* > 1. Configure a resource type, named 'testres'. > 2. Node1 runs RM, Node 2/3 runs NMs > 3. Node2 has 1 testres > 4. Node3 has 0 testres > 5. Stop all nodes > 6. Start RM on Node1 > 7. Start NM on Node3 (the one without the resource) > 8. Start a pi job, request 1 testres for the AM > Here's the command to start the job: > {code:java} > MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" > pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code} > > *Configurations*: > node1: yarn-site.xml of ResourceManager: > {code:java} > > yarn.resource-types > testres > {code} > node2: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > > > yarn.nodemanager.resource-type.testres > 1 > {code} > node3: yarn-site.xml of NodeManager: > {code:java} > > yarn.resource-types > testres > {code} > Please see full process logs from RM, NM, YARN-client attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Created] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold
Szilard Nemeth created YARN-9421: Summary: Implement SafeMode for ResourceManager by defining a resource threshold Key: YARN-9421 URL: https://issues.apache.org/jira/browse/YARN-9421 Project: Hadoop YARN Issue Type: New Feature Reporter: Szilard Nemeth Assignee: Szilard Nemeth We have a hypothetical testcase in our test suite that tests Resource Types. The test does the following: 1. Sets up a resource named "gpu" 2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu". 3. It executes a sleep job with resoure requests: "-Dmapreduce.reduce.resource.gpu=7" and "-Dyarn.app.mapreduce.am.resource.gpu=11" Sometimes, we encounter situations when the app submission fails with: {code:java} 2019-02-25 06:09:56,795 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission failed in validating AM resource request for application application_1551103768202_0001 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[gpu], Requested resource=, maximum allowed allocation=, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation={code} It's clearly visible that the maximum allowed allocation does not have any "gpu" resources. Looking into the logs further, I realized that sometimes the node having the "gpu" resources are registered after the app is submitted. In a real world situation and even with this very special test exexution, we can't be sure which order NMs are registering with RM. With the advent of resource types, this issue was more likely surface. If we have a cluster with some "rare" resources like GPUs only on some nodes out of a 100, we can quickly run into a situation when the NMs with GPUs are registering later than the normal nodes. While the critical NMs are still registering, we will most likely experience the same InvalidResourceRequestException if we submit jobs requesting GPUs. There is a naive solution to this: 1. Give some time for RM to wait for NMs to be able to register themselves and put submitted applications on hold. This could work in some situations but it's not the most flexible solution as different clusters can have different requirements. Of course, we can make this more flexible by making the timeout value configurable. *A more flexible alternative would be:* 2. We define a threshold of Resource capability: While we haven't reached this threshold, we put submitted jobs on hold. Once we reached the threshold, we enable jobs to pass through. This is very similar to an already existing concept, the SafeMode in HDFS ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]). Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 GPUs. Defining a threshold like this, we can ensure most of the submitted jobs won't be lost, just "parked" until NMs are registered. The final solution could be the Resource threshold, or the combination of the threshold and timeout value. I'm open for any other suggestion as well. *Last but not least, a very easy way to reproduce the issue on a 3 node cluster:* 1. Configure a resource type, named 'testres'. 2. Node1 runs RM, Node 2/3 runs NMs 3. Node2 has 1 testres 4. Node3 has 0 testres 5. Stop all nodes 6. Start RM on Node1 7. Start NM on Node3 (the one without the resource) 8. Start a pi job, request 1 testres for the AM Here's the command to start the job: {code:java} MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code} *Configurations*: node1: yarn-site.xml of ResourceManager: {code:java} yarn.resource-types testres {code} node2: yarn-site.xml of NodeManager: {code:java} yarn.resource-types testres yarn.nodemanager.resource-type.testres 1 {code} node3: yarn-site.xml of NodeManager: {code:java} yarn.resource-types testres {code} Please see full process logs from RM, NM, YARN-client attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Attachment: YARN-9281.006.patch > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch, > YARN-9281.006.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803298#comment-16803298 ] Eric Yang commented on YARN-9281: - Patch 4 contains duplicated section from YARN-9255. Patch 5 removed the duplicated section. > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Attachment: YARN-9281.005.patch > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803287#comment-16803287 ] Hadoop QA commented on YARN-9281: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-9281 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9281 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963933/YARN-9281.004.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23819/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803282#comment-16803282 ] Eric Yang commented on YARN-9281: - Patch 004 removed build related patches to YARN-9348 patch 008, and some rebase for changes happened in YARN-7129. > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9420) Avoid potentially dangerous filename concatenation in native code (cgroups-operations.c)
Szilard Nemeth created YARN-9420: Summary: Avoid potentially dangerous filename concatenation in native code (cgroups-operations.c) Key: YARN-9420 URL: https://issues.apache.org/jira/browse/YARN-9420 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth In cgroups-operations.c, in function get_cgroups_path_to_write, at the end of the function, there's a string formatting operation: {code:java} if (snprintf(buffer, MAX_PATH_LEN, "%s/%s/%s/%s/%s.%s", cgroups_root, hierarchy_name, yarn_hierarchy_name, group_id, hierarchy_name, param_name) < 0) { fprintf(ERRORFILE, "Failed to print output path.\n"); failed = 1; goto cleanup; }{code} This functions is being called from just one function: update_cgroups_parameters All calls of update_cgroups_parameters look like this (note that only the last parameter differs): {code:java} update_cgroups_parameters_func_p("devices", "deny", container_id, param_value);{code} So essentially, get_cgroups_path_to_write will have these arguments: 1. hierarchy_name: "devices" 2. param_name: "allow" 3. group_id: container_id An example of a full path: {code:java} /var/lib/yarn-ce/cgroups/devices/hadoop-yarn/c_1/devices.deny{code} , where: 1. cgroups_root = "/var/lib/yarn-ce/cgroups" 2. hierarchy_name = "devices" 3. yarn_hierarchy_name = "/hadoop-yarn" 4. group_id = "c_1" 5. param_name = "deny" The problem is that the last bit of the format string ("%s.%s") relies on the fact that the variable hierarchy_name holds the value "devices", so it can be reused in the path and for the filename as well ("devices.deny"). It would be more clear if param_name would hold the whole filename as is, e.g. "devices.allow", instead of manually constructing it from 2 strings. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Attachment: YARN-9281.004.patch > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
[ https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-9419: Assignee: Gergely Pollak > Log a warning if GPU isolation is enabled but LinuxContainerExecutor is > disabled > > > Key: YARN-9419 > URL: https://issues.apache.org/jira/browse/YARN-9419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Gergely Pollak >Priority: Major > > A WARN log should be added at least (logged once on startup) that notifies > the user about a potentially offending configuration: GPU isolation is > enabled but LCE is disabled. > I think this is a dangerous, yet valid configuration: As LCE is the only > container executor that utilizes cgroups, no real HW-isolation happens if LCE > is disabled. > Let's suppose we have 2 GPU devices in 1 node: > # NM reports 2 devices (as a Resource) to RM > # RM assigns GPU#1 to container#2 that requests 1 GPU device > # When container#2 is also requesting 1 GPU device, RM is going to assign > either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. > If GPU#1 is assigned to a second container, nasty things could happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
Szilard Nemeth created YARN-9419: Summary: Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled Key: YARN-9419 URL: https://issues.apache.org/jira/browse/YARN-9419 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth A WARN log should be added at least (logged once on startup) that notifies the user about a potentially offending configuration: GPU isolation is enabled but LCE is disabled. I think this is a dangerous, yet valid configuration: As LCE is the only container executor that utilizes cgroups, no real HW-isolation happens if LCE is disabled. Let's suppose we have 2 GPU devices in 1 node: # NM reports 2 devices (as a Resource) to RM # RM assigns GPU#1 to container#2 that requests 1 GPU device # When container#2 is also requesting 1 GPU device, RM is going to assign either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. If GPU#1 is assigned to a second container, nasty things could happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803272#comment-16803272 ] Eric Yang commented on YARN-9348: - Patch 008 fixed a git rebase issue in Patch 007. > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.008.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803251#comment-16803251 ] Jonathan Hung commented on YARN-8200: - Attached YARN-8200-branch-3.0.001 containing all the commits targeted for branch-3.0. (The commit list is at YARN-8200.branch3 branch which has been rebased on latest branch-3.0). > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8200: Attachment: YARN-8200-branch-3.0.001.patch > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803250#comment-16803250 ] Jonathan Hung commented on YARN-8200: - Yes [~Jim_Brennan], we plan to do a 2.10 release with this feature. > Backport resource types/GPU features to branch-3.0/branch-2 > --- > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-8200-branch-2.001.patch, > YARN-8200-branch-3.0.001.patch, > counter.scheduler.operation.allocate.csv.defaultResources, > counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json > > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9412) Backport YARN-6909 to branch-2
[ https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung resolved YARN-9412. - Resolution: Fixed This ended up being a clean port. Closing. > Backport YARN-6909 to branch-2 > -- > > Key: YARN-9412 > URL: https://issues.apache.org/jira/browse/YARN-9412 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803198#comment-16803198 ] Eric Yang commented on YARN-9348: - - Rebase patch 007 for changes happened in YARN-7129 patch 035. - Moved parallel-tests profile from YARN-9281 because the changes set matches the context of this issue. > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.007.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
Prabhu Joseph created YARN-9418: --- Summary: ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics Key: YARN-9418 URL: https://issues.apache.org/jira/browse/YARN-9418 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.2.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph ATSV2 entities rest api does not show the metrics {code:java} [hbase@yarn-ats-3 centos]$ curl -s "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase=METRICS; | jq . { "metrics": [], "events": [], "createdtime": 1553695002014, "idprefix": 0, "type": "YARN_CONTAINER", "id": "container_e18_1553685341603_0006_01_01", "info": { "UID": "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", "FROM_ID": "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{code} NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this is not shown in above output. Found NM container entries are updated with right flowRunId (startTime of the job) whereas RM container entries are updated with default 0. TimelineReader fetches only rows which are updated by RM (i.e, rowkeys with flowRunId 0). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.006.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Attachment: (was: YARN-9281.004.patch) > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: (was: YARN-9348.006.patch) > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Comment: was deleted (was: Rebase patch to match changes happened in YARN-7129.) > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.006.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9281: Attachment: YARN-9281.004.patch > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI
[ https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803165#comment-16803165 ] Eric Yang commented on YARN-9281: - Rebase patch to match changes happened in YARN-7129. > Add express upgrade button to Appcatalog UI > --- > > Key: YARN-9281 > URL: https://issues.apache.org/jira/browse/YARN-9281 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9281.001.patch, YARN-9281.002.patch, > YARN-9281.003.patch, YARN-9281.004.patch > > > It would be nice to have ability to upgrade applications deployed by > Application catalog from Application catalog UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803163#comment-16803163 ] Jonathan Hung edited comment on YARN-9409 at 3/27/19 6:11 PM: -- Thanks Zhe, committed to YARN-8200 and YARN-8200.branch3 was (Author: jhung): Thanks Zhe, committed to YARN-8200 > Port resource type changes from YARN-7237 to branch-3.0/branch-2 > > > Key: YARN-9409 > URL: https://issues.apache.org/jira/browse/YARN-9409 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9409-YARN-8200.branch3.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use
[ https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803152#comment-16803152 ] Prabhu Joseph commented on YARN-9411: - Thanks [~giovanni.fumarola]! > TestYarnNativeServices fails sporadically with bind address in use > -- > > Key: YARN-9411 > URL: https://issues.apache.org/jira/browse/YARN-9411 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn-native-services >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9411-001.patch > > > TestYarnNativeServices fails sporadically with bind address in use > {code} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273) > at > org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) >
[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use
[ https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803149#comment-16803149 ] Hudson commented on YARN-9411: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16296 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16296/]) YARN-9411. TestYarnNativeServices fails sporadically with bind address (gifuma: rev 9cd66198ee8c2e531fa17a306e33c49d054a1ef7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java > TestYarnNativeServices fails sporadically with bind address in use > -- > > Key: YARN-9411 > URL: https://issues.apache.org/jira/browse/YARN-9411 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn-native-services >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9411-001.patch > > > TestYarnNativeServices fails sporadically with bind address in use > {code} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273) > at > org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285) > at >
[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use
[ https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803132#comment-16803132 ] Giovanni Matteo Fumarola commented on YARN-9411: Thanks [~Prabhu Joseph] . LGTM +1. Committed to trunk. > TestYarnNativeServices fails sporadically with bind address in use > -- > > Key: YARN-9411 > URL: https://issues.apache.org/jira/browse/YARN-9411 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn-native-services >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9411-001.patch > > > TestYarnNativeServices fails sporadically with bind address in use > {code} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273) > at > org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >
[jira] [Updated] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use
[ https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-9411: --- Fix Version/s: 3.3.0 > TestYarnNativeServices fails sporadically with bind address in use > -- > > Key: YARN-9411 > URL: https://issues.apache.org/jira/browse/YARN-9411 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn-native-services >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9411-001.patch > > > TestYarnNativeServices fails sporadically with bind address in use > {code} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128) > at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273) > at > org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.net.BindException: Problem binding to [034772d29930:45301] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66) > at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) > at >
[jira] [Commented] (YARN-9269) Minor cleanup in FpgaResourceAllocator
[ https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803117#comment-16803117 ] Hudson commented on YARN-9269: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16295 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16295/]) YARN-9269. Minor cleanup in FpgaResourceAllocator. Contributed by Peter (devaraj: rev a4cd75e09c934699ec5e2fa969f1c8d0a14c1d49) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceAllocator.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceHandlerImpl.java > Minor cleanup in FpgaResourceAllocator > -- > > Key: YARN-9269 > URL: https://issues.apache.org/jira/browse/YARN-9269 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9269-001.patch, YARN-9269-002.patch, > YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch > > > Some stuff that we observed: > * {{addFpga()}} - we check for duplicate devices, but we don't print any > error/warning if there's any. > * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is > this method even needed? We already receive an {{FpgaDevice}} instance in > {{updateFpga()}} which I believe is the same that we're looking up. > * variable {{IPIDpreference}} is confusing > * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of > {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple > {{HashMap}} suffice? > * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear > * {{allowedFpgas}} should be an immutable list > * {{@VisibleForTesting}} methods should be package private > * get rid of {{*}} imports -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803090#comment-16803090 ] Devaraj K commented on YARN-9270: - [~pbacsko], can you rebase this patch? > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9269) Minor cleanup in FpgaResourceAllocator
[ https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated YARN-9269: Priority: Minor (was: Major) Hadoop Flags: Reviewed +1, latest patch looks good to me, committing it shortly. > Minor cleanup in FpgaResourceAllocator > -- > > Key: YARN-9269 > URL: https://issues.apache.org/jira/browse/YARN-9269 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Attachments: YARN-9269-001.patch, YARN-9269-002.patch, > YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch > > > Some stuff that we observed: > * {{addFpga()}} - we check for duplicate devices, but we don't print any > error/warning if there's any. > * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is > this method even needed? We already receive an {{FpgaDevice}} instance in > {{updateFpga()}} which I believe is the same that we're looking up. > * variable {{IPIDpreference}} is confusing > * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of > {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple > {{HashMap}} suffice? > * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear > * {{allowedFpgas}} should be an immutable list > * {{@VisibleForTesting}} methods should be package private > * get rid of {{*}} imports -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803044#comment-16803044 ] Zhe Zhang commented on YARN-9409: - +1 > Port resource type changes from YARN-7237 to branch-3.0/branch-2 > > > Key: YARN-9409 > URL: https://issues.apache.org/jira/browse/YARN-9409 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9409-YARN-8200.branch3.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803040#comment-16803040 ] Eric Yang commented on YARN-9414: - [~adam.antal] Thank you for the interest in this feature. I have updated the design document to reflect the recent changes, the new file is YARN-Application-Catalog.pdf. > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9272) Backport YARN-7738 for refreshing max allocation for multiple resource types
[ https://issues.apache.org/jira/browse/YARN-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803039#comment-16803039 ] Zhe Zhang commented on YARN-9272: - +1 > Backport YARN-7738 for refreshing max allocation for multiple resource types > > > Key: YARN-9272 > URL: https://issues.apache.org/jira/browse/YARN-9272 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9272-YARN-8200.001.patch, > YARN-9272-YARN-8200.branch3.001.patch, YARN-9272-YARN-8200.branch3.002.patch > > > Need to port to YARN-8200.branch3 (for branch-3.0) and YARN-8200 (for > branch-2) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9414: Attachment: YARN-Application-Catalog.pdf > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently
[ https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-9124: Assignee: Adam Antal > Resolve contradiction in ResourceUtils: addMandatoryResources / > checkMandatoryResources work differently > > > Key: YARN-9124 > URL: https://issues.apache.org/jira/browse/YARN-9124 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Adam Antal >Priority: Minor > > {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as > mandatory resources. > {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. > This method not only checks memory and vcores, but all the resources referred > in ResourceInformation#MANDATORY_RESOURCES. > I think it would be good to call {{MANDATORY_RESOURCES}} as > {{PREDEFINED_RESOURCES}} or something like that and use a similar name for > {{checkMandatoryResources}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8470) Fair scheduler exception with SLS
[ https://issues.apache.org/jira/browse/YARN-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-8470: Assignee: Szilard Nemeth > Fair scheduler exception with SLS > - > > Key: YARN-8470 > URL: https://issues.apache.org/jira/browse/YARN-8470 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Szilard Nemeth >Priority: Major > > I ran into the following exception with sls: > 2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802910#comment-16802910 ] Peter Bacsko commented on YARN-4901: Failed unit test passed locally several times. > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-4901-001.patch > > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802911#comment-16802911 ] Szilard Nemeth commented on YARN-9401: -- Hi [~jiwq]! Checked your patch, +1 (non-binding) > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9366) Make logs in TimelineClient implementation specific to application
[ https://issues.apache.org/jira/browse/YARN-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802875#comment-16802875 ] Abhishek Modi commented on YARN-9366: - Thanks for the patch [~prabham]. In YarnException, you are passing all the timeline entities that were not published. ToString function of TimelineEntities uses toString function of TimelineEntity, but that's not overridden. You need to use dumptimelineRecordsToJson to convert timelineEntity in readable format. I also think, you should log these entities in debug logs only. > Make logs in TimelineClient implementation specific to application > --- > > Key: YARN-9366 > URL: https://issues.apache.org/jira/browse/YARN-9366 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Minor > Attachments: YARN-9366.v1.patch > > > For every container launched on a NM node, a timeline client is created to > publish entities to the corresponding application's timeline collector. And > there would be multiple timeline clients running at the same time. Current > implementation of timeline client logs are insufficient to isolate publishing > problems related to one application. Hence, creating this Jira to improvise > the logs in TimelineV2ClientImpl. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802866#comment-16802866 ] Adam Antal commented on YARN-9414: -- [~eyang], thanks for making this umbrella jira. I am not really familiar with this area and would like to go through the design document, but also I noticed that its written in August 2017. Is this the latest document about this feature? > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802864#comment-16802864 ] Hadoop QA commented on YARN-4901: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 5s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-4901 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963879/YARN-4901-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 09cdd948c628 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b226958 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23816/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23816/testReport/ | | Max. process+thread count | 902 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently
[ https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802842#comment-16802842 ] Szilard Nemeth commented on YARN-9124: -- Hi [~adam.antal]! Sure, you can assign it to yourself freely if the assignee is "Unassigned"! > Resolve contradiction in ResourceUtils: addMandatoryResources / > checkMandatoryResources work differently > > > Key: YARN-9124 > URL: https://issues.apache.org/jira/browse/YARN-9124 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Minor > > {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as > mandatory resources. > {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. > This method not only checks memory and vcores, but all the resources referred > in ResourceInformation#MANDATORY_RESOURCES. > I think it would be good to call {{MANDATORY_RESOURCES}} as > {{PREDEFINED_RESOURCES}} or something like that and use a similar name for > {{checkMandatoryResources}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802840#comment-16802840 ] Szilard Nemeth commented on YARN-4901: -- hi [~pbacsko]! As per our offline talk, there's no easy way to check if the DefaultMetricsSystem is already running so it's fine to invoke shutdown without checking any condition as it won't have any consequence. +1 (non-binding) > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-4901-001.patch > > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently
[ https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802839#comment-16802839 ] Adam Antal commented on YARN-9124: -- Hi [~snemeth], Are you no longer working on this? May I assign this to myself? > Resolve contradiction in ResourceUtils: addMandatoryResources / > checkMandatoryResources work differently > > > Key: YARN-9124 > URL: https://issues.apache.org/jira/browse/YARN-9124 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Priority: Minor > > {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as > mandatory resources. > {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. > This method not only checks memory and vcores, but all the resources referred > in ResourceInformation#MANDATORY_RESOURCES. > I think it would be good to call {{MANDATORY_RESOURCES}} as > {{PREDEFINED_RESOURCES}} or something like that and use a similar name for > {{checkMandatoryResources}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9353) TestNMWebFilter should be renamed
[ https://issues.apache.org/jira/browse/YARN-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802835#comment-16802835 ] Adam Antal commented on YARN-9353: -- Thanks for the patch [~smileLee]. Verified that only the doc got modified, +1 (non-binding). > TestNMWebFilter should be renamed > - > > Key: YARN-9353 > URL: https://issues.apache.org/jira/browse/YARN-9353 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: WEI-HSIAO-LEE >Priority: Trivial > Labels: newbie, newbie++ > Attachments: YARN-9353-trunk.001.patch > > > TestNMWebFilter should be renamed to should be renamed to NMWebAppFilter, as > there is no class named NMWebFilter. The javadoc of the class is also > outdated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802833#comment-16802833 ] Jim Brennan commented on YARN-7848: --- I think I agree with [~eyang] on the use of {{-f}}. By the time we are trying to remove the container, we have already tried to kill the process and stop the container, so I don't think there is any danger in using the -f option, and it may succeed in cases where it otherwise doesn't now. I can't think of anything bad that would happen by using the force option every time in our use cases. > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9417) Implement FS equivalent of AppNameMappingPlacementRule
Wilfred Spiegelenburg created YARN-9417: --- Summary: Implement FS equivalent of AppNameMappingPlacementRule Key: YARN-9417 URL: https://issues.apache.org/jira/browse/YARN-9417 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.3.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The AppNameMappingPlacementRule is only available for the CS. We need the same kind of rule for the FS. The rule should use the application name as set in the submission context. This allows spark, mr or tez jobs to be run in their own queues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9416) Add filter options to FS placement rules
[ https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761 ] Wilfred Spiegelenburg edited comment on YARN-9416 at 3/27/19 1:22 PM: -- The proposal is to add a new child entry to all rules, like the parent rule we have now. Name of the xml node: * filter Name of the attributes supported for each: * type (_allow_ or _deny_) * users (comma separated list) * groups (comma separated _ordered_ list) The type attribute is required. One of the users and groups attributes can be omitted or left empty. If both are left empty the filter is ignored. The ordering only has an impact on the secondary group rule, and thus the group filter, in combination with the _allow_ type. That is the only rule that has a loop running over a number of values that are returned in a random order by the OS. The order in which the list is specified will be the order in which the secondary groups are evaluated in the rule. When a rule has a filter set we check the filter before we decide if the queue found will be returned. This is independent of the ACLs. was (Author: wilfreds): The proposal is to add a new child entry to all rules, like the parent rule we have now. Name of the xml node: * userfilter * groupfilter Name of the attributes supported for each: * type (order, allow or deny) * members (comma separated ordered list) When a rule has a filter set we check the filter before we decide if the queue found will be returned. This is independent of the ACLs. > Add filter options to FS placement rules > > > Key: YARN-9416 > URL: https://issues.apache.org/jira/browse/YARN-9416 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > The placement rules should allow filtering of the groups and or users that > match the rule. > In the case of the user rule you might want it to only match if the users are > member of a specific group. An other example would be to only allow specific > users to match the specified rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9416) Add filter options to FS placement rules
[ https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761 ] Wilfred Spiegelenburg commented on YARN-9416: - The proposal is to add a new child entry to all rules, like the parent rule we have now. Name of the xml node: * userfilter * groupfilter Name of the attributes supported for each: * type (order, allow or deny) * members (comma separated ordered list) When a rule has a filter set we check the filter before we decide if the queue found will be returned. This is independent of the ACLs. > Add filter options to FS placement rules > > > Key: YARN-9416 > URL: https://issues.apache.org/jira/browse/YARN-9416 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.3.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > The placement rules should allow filtering of the groups and or users that > match the rule. > In the case of the user rule you might want it to only match if the users are > member of a specific group. An other example would be to only allow specific > users to match the specified rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9416) Add filter options to FS placement rules
Wilfred Spiegelenburg created YARN-9416: --- Summary: Add filter options to FS placement rules Key: YARN-9416 URL: https://issues.apache.org/jira/browse/YARN-9416 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 3.3.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The placement rules should allow filtering of the groups and or users that match the rule. In the case of the user rule you might want it to only match if the users are member of a specific group. An other example would be to only allow specific users to match the specified rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assigning result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802741#comment-16802741 ] Hadoop QA commented on YARN-8793: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8793 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8793 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940551/YARN-8793.008.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23818/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > QueuePlacementPolicy bind more information to assigning result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch, YARN-8793.002.patch, > YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, > YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802731#comment-16802731 ] Peter Bacsko commented on YARN-4901: [~snemeth] [~templedf] could you look at this short patch? I also added {{DefaultMetricsSystem.shutdown()}} because it unregisters an object on JMX. If we don't do this, we might get: {noformat} org.apache.hadoop.metrics2.MetricsException: org.apache.hadoop.metrics2.MetricsException: Hadoop:service=ResourceManager,name=RMNMInfo already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:110) at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:123) at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:64) at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.(RMNMInfo.java:59) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:749) ... {noformat} > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-4901-001.patch > > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assigning result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802721#comment-16802721 ] Wilfred Spiegelenburg commented on YARN-8793: - The PlacementRule and PlacementManager have standardised the way a chain is terminated and what is communicated back. The FS has moved to using that interfaces to handle queue placements. Placements are handled outside the scheduler. > QueuePlacementPolicy bind more information to assigning result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch, YARN-8793.002.patch, > YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, > YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5387) FairScheduler: add the ability to specify a parent queue to all placement rules
[ https://issues.apache.org/jira/browse/YARN-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YARN-5387. - Resolution: Implemented This has been included as part of the YARN-8967 changes. Documentation is still outstanding and will be added as part of YARN-9415. > FairScheduler: add the ability to specify a parent queue to all placement > rules > --- > > Key: YARN-5387 > URL: https://issues.apache.org/jira/browse/YARN-5387 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: supportability > > In the current placement policy there all rules generate a queue name under > the root. The only exception is the nestedUserQueue rule. This rule allows a > queue to be created under a parent queue defined by a second rule. > Instead of creating new rules to also allow nested groups, secondary groups > or nested queues for new rules that we think of we should generalise this by > allowing a parent attribute to be specified in each rule like the create flag. > The optional parent attribute for a rule should allow the following values: > - empty (which is the same as not specifying the attribute) > - a rule > - a fixed value (with or without the root prefix) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802713#comment-16802713 ] Hadoop QA commented on YARN-8795: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8795 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12940534/YARN-8795.004.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23817/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch, YARN-8795.003.patch, > YARN-8795.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802708#comment-16802708 ] Wilfred Spiegelenburg commented on YARN-8795: - The rules have been moved as part of the move to a new interface. The rules are now all using the PlacementRule and are now all located in their own file(s). > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch, YARN-8795.003.patch, > YARN-8795.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy
[ https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802707#comment-16802707 ] Wilfred Spiegelenburg commented on YARN-8792: - None of these changes fit into the integrated way we currently implement the rules in the FS and CS. As part of YARN-8948, YARN-9298 and finally integrated in YARN-8967 this has been changed. Both schedulers now use the same placement manager and placement rule code. The placement of the application in a queue is moved out of the FS also. > Revisit FairScheduler QueuePlacementPolicy > --- > > Key: YARN-8792 > URL: https://issues.apache.org/jira/browse/YARN-8792 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > > Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There > are several problems: > # The termination of the responsibility chain should bind to the assigning > result instead of the rule. > # It should provide a reason when rejecting a request. > # Still need more useful rules: > ## RejectNonLeafQueue > ## RejectDefaultQueue > ## RejectUsers > ## RejectQueues > ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues
[ https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YARN-2257. - Resolution: Duplicate This has been fixed as part of YARN-8948, YARN-9298 and finally integrated in YARN-8967. Both schedulers use the same placement manager and placement rule code. The rules are different for both schedulers as the FS uses a slightly different setup with rule chaining and creation of queues that do not exist. The fix is in 3.3 and later: marking this as a duplicate of YARN-8967 > Add user to queue mappings to automatically place users' apps into specific > queues > -- > > Key: YARN-2257 > URL: https://issues.apache.org/jira/browse/YARN-2257 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Patrick Liu >Assignee: Vinod Kumar Vavilapalli >Priority: Major > Labels: features > > Currently, the fair-scheduler supports two modes, default queue or individual > queue for each user. > Apparently, the default queue is not a good option, because the resources > cannot be managed for each user or group. > However, individual queue for each user is not good enough. Especially when > connecting yarn with hive. There will be increasing hive users in a corporate > environment. If we create a queue for a user, the resource management will be > hard to maintain. > I think the problem can be solved like this: > 1. Define user->queue mapping in Fair-Scheduler.xml. Inside each queue, use > aclSubmitApps to control user's ability. > 2. Each time a user submit an app to yarn, if the user has mapped to a queue, > the app will be scheduled to that queue; otherwise, the app will be submitted > to default queue. > 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-4901: --- Attachment: YARN-4901-001.patch > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-4901-001.patch > > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts
[ https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802695#comment-16802695 ] Peter Bacsko commented on YARN-4901: It also affects {{TestApplicationLauncher.testAMLaunchAndCleanup}}. > MockRM should clear the QueueMetrics when it starts > --- > > Key: YARN-4901 > URL: https://issues.apache.org/jira/browse/YARN-4901 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Reporter: Daniel Templeton >Assignee: Peter Bacsko >Priority: Major > > The {{ResourceManager}} rightly assumes that when it starts, it's starting > from naught. The {{MockRM}}, however, violates that assumption. For > example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} > instance. The {{QueueMetrics.queueMetrics}} field is static, which means > that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} > bleed over. Having the MockRM clear the {{QueueMetrics}} when it starts > should resolve the issue. I haven't looked yet at scope to see how hard easy > that is to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6389) [ATSv2] metricslimit query parameter do not work
[ https://issues.apache.org/jira/browse/YARN-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802691#comment-16802691 ] Prabhu Joseph commented on YARN-6389: - [~rohithsharma] Have tested metricslimit with hbase-1.2.6 and hadoop-3.2.0 and it is working fine. Can you check if there is any difference in the test. *Rest Api Output:* {code:java} [hbase@yarn-ats-3 centos]$ curl -s "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553684950097_0001?user.name=hbase=METRICS&=YARN_APPLICATION_MEM_PREEMPT_METRIC=3; | jq . "metrics": [ { "type": "TIME_SERIES", "id": "YARN_APPLICATION_MEM_PREEMPT_METRIC", "aggregationOp": "NOP", "values": { "1553684992594": 51481, "1553684992593": 51480, "1553684992592": 51479 } } ] } Have changed YARN_APPLICATION_MEM_PREEMPT_METRIC to TIME_SERIES type for testing purpose.{code} *Hbase Shell Output:* {code:java} hbase(main):006:0* scan 't1' ROW COLUMN+CELL row column=f1:c, timestamp=1553685646951, value=value1 1 row(s) in 0.0740 seconds hbase(main):010:0* scan 't1', { VERSIONS => 2} ROW COLUMN+CELL row column=f1:c, timestamp=1553685646951, value=value1 row column=f1:c, timestamp=1553685642756, value=value 1 row(s) in 0.0150 seconds{code} > [ATSv2] metricslimit query parameter do not work > > > Key: YARN-6389 > URL: https://issues.apache.org/jira/browse/YARN-6389 > Project: Hadoop YARN > Issue Type: Bug > Components: timelinereader >Reporter: Rohith Sharma K S >Assignee: Varun Saxena >Priority: Critical > > It is observed that metricslimit query parameter do not work. And also by > default metricslimit is set to 1, all the metrics versions are retrieved. > One thing I noticed that even though GenericEntityReader is setting > Scan.setMaxVersions(), all the metrics are retrieved. It appears something > wrong in TimelieneWritter or the way hbase filters is being used. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9269) Minor cleanup in FpgaResourceAllocator
[ https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802618#comment-16802618 ] Peter Bacsko commented on YARN-9269: [~devaraj.k] [~sunilg] Could you guys review + commit it if it looks good? Thanks! > Minor cleanup in FpgaResourceAllocator > -- > > Key: YARN-9269 > URL: https://issues.apache.org/jira/browse/YARN-9269 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9269-001.patch, YARN-9269-002.patch, > YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch > > > Some stuff that we observed: > * {{addFpga()}} - we check for duplicate devices, but we don't print any > error/warning if there's any. > * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is > this method even needed? We already receive an {{FpgaDevice}} instance in > {{updateFpga()}} which I believe is the same that we're looking up. > * variable {{IPIDpreference}} is confusing > * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of > {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple > {{HashMap}} suffice? > * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear > * {{allowedFpgas}} should be an immutable list > * {{@VisibleForTesting}} methods should be package private > * get rid of {{*}} imports -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9415) Document FS placement rule changes from YARN-8967
Wilfred Spiegelenburg created YARN-9415: --- Summary: Document FS placement rule changes from YARN-8967 Key: YARN-9415 URL: https://issues.apache.org/jira/browse/YARN-9415 Project: Hadoop YARN Issue Type: Improvement Components: documentation, fairscheduler Affects Versions: 3.3.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg With the changes introduced by YARN-8967 we now allow parent rules on all existing rules. This should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6567) Flexible Workload Management
[ https://issues.apache.org/jira/browse/YARN-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg reassigned YARN-6567: --- Assignee: Wilfred Spiegelenburg > Flexible Workload Management > > > Key: YARN-6567 > URL: https://issues.apache.org/jira/browse/YARN-6567 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Ajai Omtri >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: features > > Yarn workload management can be little more dynamic. > 1. Create yarn pool by specifying more than one Secondary AD group. > Scenario: > In a multi-tenant cluster there can be hundreds of AD groups per tenant and > hundreds of users per AD group. We want a way to group like workloads into > single yarn pool by specifying multiple secondary AD Groups. > Ex: All the ETL workloads of tenants needs to go into one yarn pool. This > requires addition of all ETL related AD groups into one yarn pool. > 2. Demotions > Scenario: A particular workload/job has been started in a high priority yarn > pool based on the assumption that it would finish quickly but due to some > data issue/change in the code/query etc. - now it is running longer and > consuming high amounts of resources for long time. In this case we want to > demote this to a lower resource allocated yarn pool. We don’t want this one > run-away workload/job to dominate the cluster because our assumption was > wrong. > Ex: If any workload in the yarn pool runs for X minutes and/or consumes Y > resources either alert me or push to another yarn pool. User can keep > demoting and can push to a yarn pool which has capped resources - like > Penalty box. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler
[ https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9290: Affects Version/s: 3.2.0 > Invalid SchedulingRequest not rejected in Scheduler > PlacementConstraintsHandler > > > Key: YARN-9290 > URL: https://issues.apache.org/jira/browse/YARN-9290 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9290-001.patch, YARN-9290-002.patch, > YARN-9290-003.patch > > > SchedulingRequest with Invalid namespace is not rejected in Scheduler > PlacementConstraintsHandler. RM keeps on trying to allocateOnNode with > logging the exception. This is rejected in case of placement-processor > handler. > {code} > 2019-02-08 16:51:27,548 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator: > Failed to query node cardinality: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.InvalidAllocationTagsQueryException: > Invalid namespace prefix: notselfi, valid values are: > all,not-self,app-id,app-tag,self > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.fromString(TargetApplicationsNamespace.java:277) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.parse(TargetApplicationsNamespace.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.AllocationTags.createAllocationTags(AllocationTags.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraintExpression(PlacementConstraintsUtil.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraint(PlacementConstraintsUtil.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyAndConstraint(PlacementConstraintsUtil.java:272) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:324) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:365) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.checkCardinalityAndPending(SingleConstraintAppPlacementAllocator.java:355) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.precheckNode(SingleConstraintAppPlacementAllocator.java:395) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.precheckNode(AppSchedulingInfo.java:779) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.preCheckForNodeCandidateSet(RegularContainerAllocator.java:145) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:890) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:977) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1630) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1624) > at >
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802513#comment-16802513 ] Weiwei Yang commented on YARN-9413: --- Thanks for raising this issue, the fix makes sense. Also thanks for adding tests to cover this scenario. Can you submit the patch to trigger jenkins job? > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org