date:20190327



[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803574#comment-16803574
 ] 

Eric Yang commented on YARN-9292:
-

{quote}For images, we probably need to write command file to a path independent 
of containers under nmPrivate directory. Our code can ensure that once the 
command is executed, the temp .cmd file is deleted.

I do think it is important that we don't expose this API with 
container/container id in it because there is no logical relation between the 
image and the container.{quote}

The cmd file is placed in application directory, and by deleting application 
directory by the current logic.  There is no additional code to be written for 
clean up.  The side benefit is that caller needs to know the running 
application ID to generate a container id that can call docker images command. 
This makes it more difficult for external party without running an app to get 
to docker image command.   The current code reduces exposure of docker images 
command to unauthorized user, and less likely to open security hole in the flow 
for PrivilegedOperation/Container-Executor initializing secure directory, and 
clean up.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag

2019-03-27 Thread Chandni Singh (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803551#comment-16803551
 ] 

Chandni Singh commented on YARN-9292:
-

{quote} 
Real container id of the application master provides the already initialized 
path and .cmd file is stored in existing container directory. cmd file gets 
clean up when application is finished. Using randomly generated container id 
will not clean up as nicely.
{quote}
[~eyang] In patch 6, a random container id is already being created on the 
client side which is the {{ServiceScheduler}}. It is creating a container id 
from the appId and the current system time.

{code}
+  ContainerId cid = ContainerId
+  .newContainerId(ApplicationAttemptId.newInstance(appId, 1),
+  System.currentTimeMillis());
{code}
 
For images, we probably need to write command file to a path independent of 
containers under nmPrivate directory.  Our code can ensure that once the 
command is executed, the temp .cmd file is deleted.

I do think it is important that we don't expose this API with 
container/container id in it because there is no logical relation between the 
image and the container.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803546#comment-16803546
 ] 

Jonathan Hung commented on YARN-8200:
-

* TestSecureLogins failure related to HADOOP-16031
 * 
[TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart|https://builds.apache.org/job/PreCommit-YARN-Build/23820/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestOpportunisticContainerAllocatorAMService/testContainerPromoteAndDemoteBeforeContainerStart/]
 related to YARN-8011
 * 
TestOpportunisticContainerAllocatorAMService.testAppAttemptRemovalAfterNodeRemoval
 is already failing in branch-3.0
 * TestNodeLabelContainerAllocation related to YARN-9006

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803534#comment-16803534
 ] 

Hadoop QA commented on YARN-8200:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 26 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
23s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
38s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
34s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
24s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 41s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 124 new + 1302 unchanged - 27 fixed = 1426 total (was 1329) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 524 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m  4s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} |

[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag



[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803524#comment-16803524
 ] 

Eric Yang commented on YARN-9292:
-

[~csingh] Thanks for the review.  I tried using randomly generated container id 
but the nmPrivate directory needs to be initialized and tracked separately.  
Real container id of the application master provides the already initialized 
path and .cmd file is stored in existing container directory.  cmd file gets 
clean up when application is finished.  Using randomly generated container id 
will not clean up as nicely.

I will make the logging change and add a new test for ServiceScheduler in the 
next patch.  Thanks

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803519#comment-16803519
 ] 

Jonathan Hung commented on YARN-8200:
-

Attached YARN-8200-branch-2.002 containing all the commits targeted for 
branch-2. (The commit list is at YARN-8200 branch which has been rebased on 
latest branch-2).

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



 [ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8200:

Attachment: YARN-8200-branch-2.002.patch

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-2.002.patch, YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6063) Inline java doc with Hadoop formatter

2019-03-27 Thread WEI-HSIAO-LEE (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WEI-HSIAO-LEE reassigned YARN-6063:
---

Assignee: WEI-HSIAO-LEE

> Inline java doc with Hadoop formatter
> -
>
> Key: YARN-6063
> URL: https://issues.apache.org/jira/browse/YARN-6063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: WEI-HSIAO-LEE
>Priority: Trivial
>  Labels: newbie
>
> I see that most of the classes in TimelineReader java doc does not meet 
> Hadoop formatters. 
> This causes every time patch preparation need an extra attention on top of 
> real fix patches. 
> It is better to be source code is inline with Hadoop formatters. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9412) Backport YARN-6909 to branch-2

2019-03-27 Thread Anthony Hsu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803465#comment-16803465
 ] 

Anthony Hsu commented on YARN-9412:
---

Awesome!

> Backport YARN-6909 to branch-2
> --
>
> Key: YARN-9412
> URL: https://issues.apache.org/jira/browse/YARN-9412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag

2019-03-27 Thread Chandni Singh (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803429#comment-16803429
 ] 

Chandni Singh commented on YARN-9292:
-

[~eyang] The rest API added here to find the image is independent of any 
container. So I don't think we should have the container and container id in 
the path.
{code}
  @Path("/container/{id}/docker/images/{name}")
{code}
If this is done because the DockerCommandExecutor needs a container id, we 
could change the implementation here to use a dummy container id. This 
implementation couldd be fixed later but the rest API will not be affected and 
will remain unchanged..
{code}
 String output = DockerCommandExecutor.executeDockerCommand(
  dockerImagesCommand, id, null, privOpExecutor, false, nmContext);
{code}
We could generate a dummy container id here instead of doing it in every client.

Some other nitpicks:

1. Log statements in ServiceScheduler can be parameterized which improves 
readability.
{code}
  LOG.info("Docker image: " + id + " maps to: " + imageId); ->
 LOG.info("Docker image: {} maps to : {}", id, imageId);
{code}

2. There aren't any tests for the new code added to {{ServiceScheduler}}. Will 
it be possible to add one?

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold



[ 
https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803416#comment-16803416
 ] 

Eric Yang commented on YARN-9421:
-

There is a few corner cases to consider.  If the size of the YARN cluster 
changes frequently, safe mode mechanism might kick in at random time?  If the 
jobs are queued during safe mode, job queue tracking also increase memory usage 
of resource manager.  At some point, the queue size will be full because there 
is finite amount of tracking memory for resource manager.  

What happen if job queue length is full, and what happens if jobs take too long 
to start and missed SLA?  If job queue is full, and it falls back to the same 
type of error messages for showing resource unavailable.  It might be better to 
let client side retry decision kicking sooner rather than queuing and found out 
queue is full later.  Option 2 is a option to mask transient problem, but retry 
logic still depends on the client to make the right decision.  I think the 
default behavior does not need to change for production cluster, but option 2 
is nice to have for improving user experience for testing cluster.

> Implement SafeMode for ResourceManager by defining a resource threshold
> ---
>
> Key: YARN-9421
> URL: https://issues.apache.org/jira/browse/YARN-9421
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Szilard Nemeth
>Priority: Major
> Attachments: client-log.log, nodemanager.log, resourcemanager.log
>
>
> We have a hypothetical testcase in our test suite that tests Resource Types.
>  The test does the following: 
>  1. Sets up a resource named "gpu"
>  2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu".
>  3. It executes a sleep job with resoure requests: 
>  "-Dmapreduce.reduce.resource.gpu=7" and 
> "-Dyarn.app.mapreduce.am.resource.gpu=11"
> Sometimes, we encounter situations when the app submission fails with: 
> {code:java}
> 2019-02-25 06:09:56,795 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission 
> failed in validating AM resource request for application 
> application_1551103768202_0001
>  org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[gpu], Requested 
> resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation={code}
> It's clearly visible that the maximum allowed allocation does not have any 
> "gpu" resources.
>  
> Looking into the logs further, I realized that sometimes the node having the 
> "gpu" resources are registered after the app is submitted.
>  In a real world situation and even with this very special test exexution, we 
> can't be sure which order NMs are registering with RM.
>  With the advent of resource types, this issue was more likely surface.
> If we have a cluster with some "rare" resources like GPUs only on some nodes 
> out of a 100, we can quickly run into a situation when the NMs with GPUs are 
> registering later than the normal nodes. While the critical NMs are still 
> registering, we will most likely experience the same 
> InvalidResourceRequestException if we submit jobs requesting GPUs.
> There is a naive solution to this: 
>  1. Give some time for RM to wait for NMs to be able to register themselves 
> and put submitted applications on hold. This could work in some situations 
> but it's not the most flexible solution as different clusters can have 
> different requirements. Of course, we can make this more flexible by making 
> the timeout value configurable.
> *A more flexible alternative would be:*
>  2. We define a threshold of Resource capability: While we haven't reached 
> this threshold, we put submitted jobs on hold. Once we reached the threshold, 
> we enable jobs to pass through. 
>  This is very similar to an already existing concept, the SafeMode in HDFS 
> ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]).
>  Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 
> GPUs. 
>  Defining a threshold like this, we can ensure most of the submitted jobs 
> won't be lost, just "parked" until NMs are registered.
> The final solution could be the Resource threshold, or the combination of the 
> threshold and timeout value. I'm open for any other suggestion as well.
> *Last but not least, a very easy way to reproduce the issue on a 3 node 
> cluster:* 
>  1. Configure a resource type, named 'testres'.
>  2. Node1 runs RM, Node 2/3 runs NMs
>  3.

[jira] [Updated] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold



 [ 
https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9421:
-
Attachment: resourcemanager.log
client-log.log
nodemanager.log

> Implement SafeMode for ResourceManager by defining a resource threshold
> ---
>
> Key: YARN-9421
> URL: https://issues.apache.org/jira/browse/YARN-9421
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Szilard Nemeth
>Priority: Major
> Attachments: client-log.log, nodemanager.log, resourcemanager.log
>
>
> We have a hypothetical testcase in our test suite that tests Resource Types.
>  The test does the following: 
>  1. Sets up a resource named "gpu"
>  2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu".
>  3. It executes a sleep job with resoure requests: 
>  "-Dmapreduce.reduce.resource.gpu=7" and 
> "-Dyarn.app.mapreduce.am.resource.gpu=11"
> Sometimes, we encounter situations when the app submission fails with: 
> {code:java}
> 2019-02-25 06:09:56,795 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission 
> failed in validating AM resource request for application 
> application_1551103768202_0001
>  org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[gpu], Requested 
> resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation={code}
> It's clearly visible that the maximum allowed allocation does not have any 
> "gpu" resources.
>  
> Looking into the logs further, I realized that sometimes the node having the 
> "gpu" resources are registered after the app is submitted.
>  In a real world situation and even with this very special test exexution, we 
> can't be sure which order NMs are registering with RM.
>  With the advent of resource types, this issue was more likely surface.
> If we have a cluster with some "rare" resources like GPUs only on some nodes 
> out of a 100, we can quickly run into a situation when the NMs with GPUs are 
> registering later than the normal nodes. While the critical NMs are still 
> registering, we will most likely experience the same 
> InvalidResourceRequestException if we submit jobs requesting GPUs.
> There is a naive solution to this: 
>  1. Give some time for RM to wait for NMs to be able to register themselves 
> and put submitted applications on hold. This could work in some situations 
> but it's not the most flexible solution as different clusters can have 
> different requirements. Of course, we can make this more flexible by making 
> the timeout value configurable.
> *A more flexible alternative would be:*
>  2. We define a threshold of Resource capability: While we haven't reached 
> this threshold, we put submitted jobs on hold. Once we reached the threshold, 
> we enable jobs to pass through. 
>  This is very similar to an already existing concept, the SafeMode in HDFS 
> ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]).
>  Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 
> GPUs. 
>  Defining a threshold like this, we can ensure most of the submitted jobs 
> won't be lost, just "parked" until NMs are registered.
> The final solution could be the Resource threshold, or the combination of the 
> threshold and timeout value. I'm open for any other suggestion as well.
> *Last but not least, a very easy way to reproduce the issue on a 3 node 
> cluster:* 
>  1. Configure a resource type, named 'testres'.
>  2. Node1 runs RM, Node 2/3 runs NMs
>  3. Node2 has 1 testres
>  4. Node3 has 0 testres
>  5. Stop all nodes
>  6. Start RM on Node1
>  7. Start NM on Node3 (the one without the resource)
>  8. Start a pi job, request 1 testres for the AM
> Here's the command to start the job:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code}
>  
> *Configurations*: 
>  node1: yarn-site.xml of ResourceManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> {code}
> node2: yarn-site.xml of NodeManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> 
> 
>  yarn.nodemanager.resource-type.testres
>  1
> {code}
> node3: yarn-site.xml of NodeManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> {code}
> Please see full process logs from RM, NM, YARN-client attached.



--
This message was sent

[jira] [Assigned] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold



 [ 
https://issues.apache.org/jira/browse/YARN-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9421:


Assignee: (was: Szilard Nemeth)

> Implement SafeMode for ResourceManager by defining a resource threshold
> ---
>
> Key: YARN-9421
> URL: https://issues.apache.org/jira/browse/YARN-9421
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Szilard Nemeth
>Priority: Major
>
> We have a hypothetical testcase in our test suite that tests Resource Types.
>  The test does the following: 
>  1. Sets up a resource named "gpu"
>  2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu".
>  3. It executes a sleep job with resoure requests: 
>  "-Dmapreduce.reduce.resource.gpu=7" and 
> "-Dyarn.app.mapreduce.am.resource.gpu=11"
> Sometimes, we encounter situations when the app submission fails with: 
> {code:java}
> 2019-02-25 06:09:56,795 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission 
> failed in validating AM resource request for application 
> application_1551103768202_0001
>  org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[gpu], Requested 
> resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation={code}
> It's clearly visible that the maximum allowed allocation does not have any 
> "gpu" resources.
>  
> Looking into the logs further, I realized that sometimes the node having the 
> "gpu" resources are registered after the app is submitted.
>  In a real world situation and even with this very special test exexution, we 
> can't be sure which order NMs are registering with RM.
>  With the advent of resource types, this issue was more likely surface.
> If we have a cluster with some "rare" resources like GPUs only on some nodes 
> out of a 100, we can quickly run into a situation when the NMs with GPUs are 
> registering later than the normal nodes. While the critical NMs are still 
> registering, we will most likely experience the same 
> InvalidResourceRequestException if we submit jobs requesting GPUs.
> There is a naive solution to this: 
>  1. Give some time for RM to wait for NMs to be able to register themselves 
> and put submitted applications on hold. This could work in some situations 
> but it's not the most flexible solution as different clusters can have 
> different requirements. Of course, we can make this more flexible by making 
> the timeout value configurable.
> *A more flexible alternative would be:*
>  2. We define a threshold of Resource capability: While we haven't reached 
> this threshold, we put submitted jobs on hold. Once we reached the threshold, 
> we enable jobs to pass through. 
>  This is very similar to an already existing concept, the SafeMode in HDFS 
> ([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]).
>  Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 
> GPUs. 
>  Defining a threshold like this, we can ensure most of the submitted jobs 
> won't be lost, just "parked" until NMs are registered.
> The final solution could be the Resource threshold, or the combination of the 
> threshold and timeout value. I'm open for any other suggestion as well.
> *Last but not least, a very easy way to reproduce the issue on a 3 node 
> cluster:* 
>  1. Configure a resource type, named 'testres'.
>  2. Node1 runs RM, Node 2/3 runs NMs
>  3. Node2 has 1 testres
>  4. Node3 has 0 testres
>  5. Stop all nodes
>  6. Start RM on Node1
>  7. Start NM on Node3 (the one without the resource)
>  8. Start a pi job, request 1 testres for the AM
> Here's the command to start the job:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code}
>  
> *Configurations*: 
>  node1: yarn-site.xml of ResourceManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> {code}
> node2: yarn-site.xml of NodeManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> 
> 
>  yarn.nodemanager.resource-type.testres
>  1
> {code}
> node3: yarn-site.xml of NodeManager:
> {code:java}
> 
>  yarn.resource-types
>  testres
> {code}
> Please see full process logs from RM, NM, YARN-client attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail:

[jira] [Created] (YARN-9421) Implement SafeMode for ResourceManager by defining a resource threshold

Szilard Nemeth created YARN-9421:

Summary: Implement SafeMode for ResourceManager by defining a
resource threshold
Key: YARN-9421
URL: https://issues.apache.org/jira/browse/YARN-9421
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth

We have a hypothetical testcase in our test suite that tests Resource Types.
The test does the following:
1. Sets up a resource named "gpu"
2. Out of 9 NodeManager nodes, 1 node has 100 of "gpu".
3. It executes a sleep job with resoure requests:
"-Dmapreduce.reduce.resource.gpu=7" and
"-Dyarn.app.mapreduce.am.resource.gpu=11"

Sometimes, we encounter situations when the app submission fails with:
{code:java}
2019-02-25 06:09:56,795 WARN
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RM app submission
failed in validating AM resource request for application
application_1551103768202_0001
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request! Cannot allocate containers as requested resource is greater
than maximum allowed allocation. Requested resource type=[gpu], Requested
resource=, maximum allowed
allocation=, please note that maximum allowed allocation
is calculated by scheduler based on maximum resource of registered
NodeManagers, which might be less than configured maximum
allocation={code}
It's clearly visible that the maximum allowed allocation does not have any
"gpu" resources.

Looking into the logs further, I realized that sometimes the node having the
"gpu" resources are registered after the app is submitted.
In a real world situation and even with this very special test exexution, we
can't be sure which order NMs are registering with RM.
With the advent of resource types, this issue was more likely surface.

If we have a cluster with some "rare" resources like GPUs only on some nodes
out of a 100, we can quickly run into a situation when the NMs with GPUs are
registering later than the normal nodes. While the critical NMs are still
registering, we will most likely experience the same
InvalidResourceRequestException if we submit jobs requesting GPUs.

There is a naive solution to this:
1. Give some time for RM to wait for NMs to be able to register themselves and
put submitted applications on hold. This could work in some situations but it's
not the most flexible solution as different clusters can have different
requirements. Of course, we can make this more flexible by making the timeout
value configurable.

*A more flexible alternative would be:*
2. We define a threshold of Resource capability: While we haven't reached this
threshold, we put submitted jobs on hold. Once we reached the threshold, we
enable jobs to pass through.
This is very similar to an already existing concept, the SafeMode in HDFS
([https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode]).
Back to my GPU example above, the threshold could be: 8 vcores, 16GB, 3 GPUs.
Defining a threshold like this, we can ensure most of the submitted jobs won't
be lost, just "parked" until NMs are registered.

The final solution could be the Resource threshold, or the combination of the
threshold and timeout value. I'm open for any other suggestion as well.

*Last but not least, a very easy way to reproduce the issue on a 3 node
cluster:*
1. Configure a resource type, named 'testres'.
2. Node1 runs RM, Node 2/3 runs NMs
3. Node2 has 1 testres
4. Node3 has 0 testres
5. Stop all nodes
6. Start RM on Node1
7. Start NM on Node3 (the one without the resource)
8. Start a pi job, request 1 testres for the AM

Here's the command to start the job:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi
-Dyarn.app.mapreduce.am.resource.testres=1 1 1000;popd{code}

*Configurations*:
node1: yarn-site.xml of ResourceManager:
{code:java}

yarn.resource-types
testres
{code}
node2: yarn-site.xml of NodeManager:
{code:java}

yarn.resource-types
testres

yarn.nodemanager.resource-type.testres
1
{code}
node3: yarn-site.xml of NodeManager:
{code:java}

yarn.resource-types
testres
{code}
Please see full process logs from RM, NM, YARN-client attached.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Attachment: YARN-9281.006.patch

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch, 
> YARN-9281.006.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI



[ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803298#comment-16803298
 ] 

Eric Yang commented on YARN-9281:
-

Patch 4 contains duplicated section from YARN-9255.  Patch 5 removed the 
duplicated section.

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Attachment: YARN-9281.005.patch

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch, YARN-9281.005.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI



[ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803287#comment-16803287
 ] 

Hadoop QA commented on YARN-9281:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-9281 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12963933/YARN-9281.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23819/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI



[ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803282#comment-16803282
 ] 

Eric Yang commented on YARN-9281:
-

Patch 004 removed build related patches to YARN-9348 patch 008, and some rebase 
for changes happened in YARN-7129.

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9420) Avoid potentially dangerous filename concatenation in native code (cgroups-operations.c)

Szilard Nemeth created YARN-9420:


 Summary: Avoid potentially dangerous filename concatenation in 
native code (cgroups-operations.c)
 Key: YARN-9420
 URL: https://issues.apache.org/jira/browse/YARN-9420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


In cgroups-operations.c, in function get_cgroups_path_to_write, at the end of 
the function, there's a string formatting operation:
{code:java}
if (snprintf(buffer, MAX_PATH_LEN, "%s/%s/%s/%s/%s.%s",
 cgroups_root, hierarchy_name, yarn_hierarchy_name,
 group_id, hierarchy_name, param_name) < 0) {
 fprintf(ERRORFILE, "Failed to print output path.\n");
 failed = 1;
 goto cleanup;
 }{code}

This functions is being called from just one function: update_cgroups_parameters
All calls of update_cgroups_parameters look like this (note that only the last 
parameter differs):
{code:java}
update_cgroups_parameters_func_p("devices", "deny",
 container_id, param_value);{code}
So essentially, get_cgroups_path_to_write will have these arguments:
1. hierarchy_name: "devices"
2. param_name: "allow"
3. group_id: container_id

An example of a full path:
{code:java}
/var/lib/yarn-ce/cgroups/devices/hadoop-yarn/c_1/devices.deny{code}
, where: 
1. cgroups_root = "/var/lib/yarn-ce/cgroups"
2. hierarchy_name = "devices"
3. yarn_hierarchy_name = "/hadoop-yarn"
4. group_id = "c_1"
5. param_name = "deny"

The problem is that the last bit of the format string ("%s.%s") relies on the 
fact that the variable hierarchy_name holds the value "devices", so it can be 
reused in the path and for the filename as well ("devices.deny").
It would be more clear if param_name would hold the whole filename as is, e.g. 
"devices.allow", instead of manually constructing it from 2 strings.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Attachment: YARN-9281.004.patch

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled



 [ 
https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9419:


Assignee: Gergely Pollak

> Log a warning if GPU isolation is enabled but LinuxContainerExecutor is 
> disabled
> 
>
> Key: YARN-9419
> URL: https://issues.apache.org/jira/browse/YARN-9419
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Gergely Pollak
>Priority: Major
>
> A WARN log should be added at least (logged once on startup) that notifies 
> the user about a potentially offending configuration: GPU isolation is 
> enabled but LCE is disabled.
> I think this is a dangerous, yet valid configuration: As LCE is the only 
> container executor that utilizes cgroups, no real HW-isolation happens if LCE 
> is disabled. 
> Let's suppose we have 2 GPU devices in 1 node:
>  # NM reports 2 devices (as a Resource) to RM
>  # RM assigns GPU#1 to container#2 that requests 1 GPU device
>  # When container#2 is also requesting 1 GPU device, RM is going to assign 
> either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. 
> If GPU#1 is assigned to a second container, nasty things could happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled

Szilard Nemeth created YARN-9419:


 Summary: Log a warning if GPU isolation is enabled but 
LinuxContainerExecutor is disabled
 Key: YARN-9419
 URL: https://issues.apache.org/jira/browse/YARN-9419
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth


A WARN log should be added at least (logged once on startup) that notifies the 
user about a potentially offending configuration: GPU isolation is enabled but 
LCE is disabled.

I think this is a dangerous, yet valid configuration: As LCE is the only 
container executor that utilizes cgroups, no real HW-isolation happens if LCE 
is disabled. 

Let's suppose we have 2 GPU devices in 1 node:
 # NM reports 2 devices (as a Resource) to RM
 # RM assigns GPU#1 to container#2 that requests 1 GPU device
 # When container#2 is also requesting 1 GPU device, RM is going to assign 
either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. If 
GPU#1 is assigned to a second container, nasty things could happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



[ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803272#comment-16803272
 ] 

Eric Yang commented on YARN-9348:
-

Patch 008 fixed a git rebase issue in Patch 007.

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.008.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803251#comment-16803251
 ] 

Jonathan Hung commented on YARN-8200:
-

Attached YARN-8200-branch-3.0.001 containing all the commits targeted for 
branch-3.0. (The commit list is at YARN-8200.branch3 branch which has been 
rebased on latest branch-3.0).

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



 [ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8200:

Attachment: YARN-8200-branch-3.0.001.patch

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803250#comment-16803250
 ] 

Jonathan Hung commented on YARN-8200:
-

Yes [~Jim_Brennan], we plan to do a 2.10 release with this feature. 

> Backport resource types/GPU features to branch-3.0/branch-2
> ---
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> YARN-8200-branch-3.0.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9412) Backport YARN-6909 to branch-2



 [ 
https://issues.apache.org/jira/browse/YARN-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9412.
-
Resolution: Fixed

This ended up being a clean port. Closing.

> Backport YARN-6909 to branch-2
> --
>
> Key: YARN-9412
> URL: https://issues.apache.org/jira/browse/YARN-9412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



[ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803198#comment-16803198
 ] 

Eric Yang commented on YARN-9348:
-

- Rebase patch 007 for changes happened in YARN-7129 patch 035.
- Moved parallel-tests profile from YARN-9281 because the changes set matches 
the context of this issue.

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.007.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch, YARN-9348.007.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics

Prabhu Joseph created YARN-9418:
---

 Summary: ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api 
does not show metrics
 Key: YARN-9418
 URL: https://issues.apache.org/jira/browse/YARN-9418
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.2.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


ATSV2 entities rest api does not show the metrics
{code:java}
[hbase@yarn-ats-3 centos]$ curl -s 
"http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase=METRICS;
 | jq .
{
"metrics": [],
"events": [],
"createdtime": 1553695002014,
"idprefix": 0,
"type": "YARN_CONTAINER",
"id": "container_e18_1553685341603_0006_01_01",
"info": {
"UID": 
"ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01",
"FROM_ID": 
"ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01"
},
"configs": {},
"isrelatedto": {},
"relatesto": {}
}{code}
NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this 
is not shown in above output. Found NM container entries are updated with right 
flowRunId (startTime of the job) whereas RM container entries are updated with 
default 0. TimelineReader fetches only rows which are updated by RM (i.e, 
rowkeys with flowRunId 0).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.006.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Attachment: (was: YARN-9281.004.patch)

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: (was: YARN-9348.006.patch)

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Comment: was deleted

(was: Rebase patch to match changes happened in YARN-7129.)

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp



 [ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9348:

Attachment: YARN-9348.006.patch

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, 
> YARN-9348.006.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9281) Add express upgrade button to Appcatalog UI



 [ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9281:

Attachment: YARN-9281.004.patch

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9281) Add express upgrade button to Appcatalog UI



[ 
https://issues.apache.org/jira/browse/YARN-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803165#comment-16803165
 ] 

Eric Yang commented on YARN-9281:
-

Rebase patch to match changes happened in YARN-7129.

> Add express upgrade button to Appcatalog UI
> ---
>
> Key: YARN-9281
> URL: https://issues.apache.org/jira/browse/YARN-9281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9281.001.patch, YARN-9281.002.patch, 
> YARN-9281.003.patch, YARN-9281.004.patch
>
>
> It would be nice to have ability to upgrade applications deployed by 
> Application catalog from Application catalog UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2



[ 
https://issues.apache.org/jira/browse/YARN-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803163#comment-16803163
 ] 

Jonathan Hung edited comment on YARN-9409 at 3/27/19 6:11 PM:
--

Thanks Zhe, committed to YARN-8200 and YARN-8200.branch3


was (Author: jhung):
Thanks Zhe, committed to YARN-8200

> Port resource type changes from YARN-7237 to branch-3.0/branch-2
> 
>
> Key: YARN-9409
> URL: https://issues.apache.org/jira/browse/YARN-9409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9409-YARN-8200.branch3.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use



[ 
https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803152#comment-16803152
 ] 

Prabhu Joseph commented on YARN-9411:
-

Thanks [~giovanni.fumarola]!

> TestYarnNativeServices fails sporadically with bind address in use
> --
>
> Key: YARN-9411
> URL: https://issues.apache.org/jira/browse/YARN-9411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn-native-services
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9411-001.patch
>
>
> TestYarnNativeServices fails sporadically with bind address in use
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273)
>   at 
> org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>

[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use

2019-03-27 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803149#comment-16803149
 ] 

Hudson commented on YARN-9411:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16296 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16296/])
YARN-9411. TestYarnNativeServices fails sporadically with bind address (gifuma: 
rev 9cd66198ee8c2e531fa17a306e33c49d054a1ef7)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java


> TestYarnNativeServices fails sporadically with bind address in use
> --
>
> Key: YARN-9411
> URL: https://issues.apache.org/jira/browse/YARN-9411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn-native-services
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9411-001.patch
>
>
> TestYarnNativeServices fails sporadically with bind address in use
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273)
>   at 
> org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285)
>   at 
>

[jira] [Commented] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use

2019-03-27 Thread Giovanni Matteo Fumarola (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803132#comment-16803132
 ] 

Giovanni Matteo Fumarola commented on YARN-9411:


Thanks [~Prabhu Joseph] . LGTM +1.
Committed to trunk.

> TestYarnNativeServices fails sporadically with bind address in use
> --
>
> Key: YARN-9411
> URL: https://issues.apache.org/jira/browse/YARN-9411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn-native-services
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9411-001.patch
>
>
> TestYarnNativeServices fails sporadically with bind address in use
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273)
>   at 
> org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
>

[jira] [Updated] (YARN-9411) TestYarnNativeServices fails sporadically with bind address in use

2019-03-27 Thread Giovanni Matteo Fumarola (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-9411:
---
Fix Version/s: 3.3.0

> TestYarnNativeServices fails sporadically with bind address in use
> --
>
> Key: YARN-9411
> URL: https://issues.apache.org/jira/browse/YARN-9411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn-native-services
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9411-001.patch
>
>
> TestYarnNativeServices fails sporadically with bind address in use
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.service.ServiceTestUtils.setupInternal(ServiceTestUtils.java:273)
>   at 
> org.apache.hadoop.yarn.service.TestYarnNativeServices.testCreateFlexStopDestroyService(TestYarnNativeServices.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [034772d29930:45301] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:66)
>   at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:55)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.getServer(ApplicationMasterService.java:225)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:191)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at 
>

[jira] [Commented] (YARN-9269) Minor cleanup in FpgaResourceAllocator

2019-03-27 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803117#comment-16803117
 ] 

Hudson commented on YARN-9269:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16295 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16295/])
YARN-9269. Minor cleanup in FpgaResourceAllocator. Contributed by Peter 
(devaraj: rev a4cd75e09c934699ec5e2fa969f1c8d0a14c1d49)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/FpgaResourceHandlerImpl.java


> Minor cleanup in FpgaResourceAllocator
> --
>
> Key: YARN-9269
> URL: https://issues.apache.org/jira/browse/YARN-9269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9269-001.patch, YARN-9269-002.patch, 
> YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch
>
>
> Some stuff that we observed:
>  * {{addFpga()}} - we check for duplicate devices, but we don't print any 
> error/warning if there's any.
>  * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is 
> this method even needed? We already receive an {{FpgaDevice}} instance in 
> {{updateFpga()}} which I believe is the same that we're looking up.
>  * variable {{IPIDpreference}} is confusing
>  * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of 
> {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple 
> {{HashMap}} suffice?
>  * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear
>  * {{allowedFpgas}} should be an immutable list
>  * {{@VisibleForTesting}} methods should be package private
>  * get rid of {{*}} imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-03-27 Thread Devaraj K (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803090#comment-16803090
 ] 

Devaraj K commented on YARN-9270:
-

[~pbacsko], can you rebase this patch?

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch, 
> YARN-9270-003.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9269) Minor cleanup in FpgaResourceAllocator

2019-03-27 Thread Devaraj K (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-9269:

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1, latest patch looks good to me, committing it shortly.

> Minor cleanup in FpgaResourceAllocator
> --
>
> Key: YARN-9269
> URL: https://issues.apache.org/jira/browse/YARN-9269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
> Attachments: YARN-9269-001.patch, YARN-9269-002.patch, 
> YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch
>
>
> Some stuff that we observed:
>  * {{addFpga()}} - we check for duplicate devices, but we don't print any 
> error/warning if there's any.
>  * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is 
> this method even needed? We already receive an {{FpgaDevice}} instance in 
> {{updateFpga()}} which I believe is the same that we're looking up.
>  * variable {{IPIDpreference}} is confusing
>  * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of 
> {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple 
> {{HashMap}} suffice?
>  * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear
>  * {{allowedFpgas}} should be an immutable list
>  * {{@VisibleForTesting}} methods should be package private
>  * get rid of {{*}} imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2

2019-03-27 Thread Zhe Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803044#comment-16803044
 ] 

Zhe Zhang commented on YARN-9409:
-

+1

> Port resource type changes from YARN-7237 to branch-3.0/branch-2
> 
>
> Key: YARN-9409
> URL: https://issues.apache.org/jira/browse/YARN-9409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9409-YARN-8200.branch3.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9414) Application Catalog for YARN applications



[ 
https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803040#comment-16803040
 ] 

Eric Yang commented on YARN-9414:
-

[~adam.antal] Thank you for the interest in this feature.  I have updated the 
design document to reflect the recent changes, the new file is 
YARN-Application-Catalog.pdf.

> Application Catalog for YARN applications
> -
>
> Key: YARN-9414
> URL: https://issues.apache.org/jira/browse/YARN-9414
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9272) Backport YARN-7738 for refreshing max allocation for multiple resource types

2019-03-27 Thread Zhe Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803039#comment-16803039
 ] 

Zhe Zhang commented on YARN-9272:
-

+1

> Backport YARN-7738 for refreshing max allocation for multiple resource types
> 
>
> Key: YARN-9272
> URL: https://issues.apache.org/jira/browse/YARN-9272
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9272-YARN-8200.001.patch, 
> YARN-9272-YARN-8200.branch3.001.patch, YARN-9272-YARN-8200.branch3.002.patch
>
>
> Need to port to YARN-8200.branch3 (for branch-3.0) and YARN-8200 (for 
> branch-2)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9414) Application Catalog for YARN applications



 [ 
https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9414:

Attachment: YARN-Application-Catalog.pdf

> Application Catalog for YARN applications
> -
>
> Key: YARN-9414
> URL: https://issues.apache.org/jira/browse/YARN-9414
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently



 [ 
https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-9124:


Assignee: Adam Antal

> Resolve contradiction in ResourceUtils: addMandatoryResources / 
> checkMandatoryResources work differently
> 
>
> Key: YARN-9124
> URL: https://issues.apache.org/jira/browse/YARN-9124
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Minor
>
> {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as 
> mandatory resources.
> {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. 
> This method not only checks memory and vcores, but all the resources referred 
> in ResourceInformation#MANDATORY_RESOURCES.
> I think it would be good to call {{MANDATORY_RESOURCES}} as 
> {{PREDEFINED_RESOURCES}} or something like that and use a similar name for 
> {{checkMandatoryResources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8470) Fair scheduler exception with SLS



 [ 
https://issues.apache.org/jira/browse/YARN-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-8470:


Assignee: Szilard Nemeth

> Fair scheduler exception with SLS
> -
>
> Key: YARN-8470
> URL: https://issues.apache.org/jira/browse/YARN-8470
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Szilard Nemeth
>Priority: Major
>
> I ran into the following exception with sls:
> 2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts



[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802910#comment-16802910
 ] 

Peter Bacsko commented on YARN-4901:


Failed unit test passed locally several times.

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-4901-001.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`



[ 
https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802911#comment-16802911
 ] 

Szilard Nemeth commented on YARN-9401:
--

Hi [~jiwq]!

Checked your patch, +1 (non-binding)

> Fix `yarn version` print the version info is the same as `hadoop version`
> -
>
> Key: YARN-9401
> URL: https://issues.apache.org/jira/browse/YARN-9401
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Minor
> Attachments: YARN-9401.001.patch, YARN-9401.002.patch
>
>
> It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` 
> instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the 
> `HADOOP_CLASSNAME` by mistake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9366) Make logs in TimelineClient implementation specific to application

2019-03-27 Thread Abhishek Modi (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802875#comment-16802875
 ] 

Abhishek Modi commented on YARN-9366:
-

Thanks for the patch [~prabham]. 

In YarnException, you are passing all the timeline entities that were not 
published. ToString function of TimelineEntities uses toString function of 
TimelineEntity, but that's not overridden. You need to use 
dumptimelineRecordsToJson to convert timelineEntity in readable format. 

I also think, you should log these entities in debug logs only.

> Make logs in TimelineClient implementation specific to application 
> ---
>
> Key: YARN-9366
> URL: https://issues.apache.org/jira/browse/YARN-9366
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Reporter: Prabha Manepalli
>Assignee: Prabha Manepalli
>Priority: Minor
> Attachments: YARN-9366.v1.patch
>
>
> For every container launched on a NM node, a timeline client is created to 
> publish entities to the corresponding application's timeline collector. And 
> there would be multiple timeline clients running at the same time. Current 
> implementation of timeline client logs are insufficient to isolate publishing 
> problems related to one application. Hence, creating this Jira to improvise 
> the logs in TimelineV2ClientImpl.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9414) Application Catalog for YARN applications



[ 
https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802866#comment-16802866
 ] 

Adam Antal commented on YARN-9414:
--

[~eyang], thanks for making this umbrella jira.

I am not really familiar with this area and would like to go through the design 
document, but also I noticed that its written in August 2017. Is this the 
latest document about this feature?

> Application Catalog for YARN applications
> -
>
> Key: YARN-9414
> URL: https://issues.apache.org/jira/browse/YARN-9414
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts



[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802864#comment-16802864
 ] 

Hadoop QA commented on YARN-4901:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m  5s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-4901 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12963879/YARN-4901-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 09cdd948c628 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b226958 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23816/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23816/testReport/ |
| Max. process+thread count | 902 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Commented] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently



[ 
https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802842#comment-16802842
 ] 

Szilard Nemeth commented on YARN-9124:
--

Hi [~adam.antal]!

Sure, you can assign it to yourself freely if the assignee is "Unassigned"!

> Resolve contradiction in ResourceUtils: addMandatoryResources / 
> checkMandatoryResources work differently
> 
>
> Key: YARN-9124
> URL: https://issues.apache.org/jira/browse/YARN-9124
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Priority: Minor
>
> {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as 
> mandatory resources.
> {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. 
> This method not only checks memory and vcores, but all the resources referred 
> in ResourceInformation#MANDATORY_RESOURCES.
> I think it would be good to call {{MANDATORY_RESOURCES}} as 
> {{PREDEFINED_RESOURCES}} or something like that and use a similar name for 
> {{checkMandatoryResources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts



[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802840#comment-16802840
 ] 

Szilard Nemeth commented on YARN-4901:
--

hi [~pbacsko]!

As per our offline talk, there's no easy way to check if the 
DefaultMetricsSystem is already running so it's fine to invoke shutdown without 
checking any condition as it won't have any consequence.

+1 (non-binding)

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-4901-001.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently



[ 
https://issues.apache.org/jira/browse/YARN-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802839#comment-16802839
 ] 

Adam Antal commented on YARN-9124:
--

Hi [~snemeth],

Are you no longer working on this? May I assign this to myself?

> Resolve contradiction in ResourceUtils: addMandatoryResources / 
> checkMandatoryResources work differently
> 
>
> Key: YARN-9124
> URL: https://issues.apache.org/jira/browse/YARN-9124
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Priority: Minor
>
> {{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as 
> mandatory resources.
> {{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. 
> This method not only checks memory and vcores, but all the resources referred 
> in ResourceInformation#MANDATORY_RESOURCES.
> I think it would be good to call {{MANDATORY_RESOURCES}} as 
> {{PREDEFINED_RESOURCES}} or something like that and use a similar name for 
> {{checkMandatoryResources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9353) TestNMWebFilter should be renamed



[ 
https://issues.apache.org/jira/browse/YARN-9353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802835#comment-16802835
 ] 

Adam Antal commented on YARN-9353:
--

Thanks for the patch [~smileLee]. 
Verified that only the doc got modified, +1 (non-binding).

> TestNMWebFilter should be renamed
> -
>
> Key: YARN-9353
> URL: https://issues.apache.org/jira/browse/YARN-9353
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: WEI-HSIAO-LEE
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9353-trunk.001.patch
>
>
> TestNMWebFilter should be renamed to should be renamed to NMWebAppFilter, as 
> there is no class named NMWebFilter. The javadoc of the class is also 
> outdated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try

2019-03-27 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802833#comment-16802833
 ] 

Jim Brennan commented on YARN-7848:
---

I think I agree with [~eyang] on the use of {{-f}}.  By the time we are trying 
to remove the container, we have already tried to kill the process and stop the 
container, so I don't think there is any danger in using the -f option, and it 
may succeed in cases where it otherwise doesn't now.  I can't think of anything 
bad that would happen by using the force option every time in our use cases.

 

> Force removal of docker containers that do not get removed on first try
> ---
>
> Key: YARN-7848
> URL: https://issues.apache.org/jira/browse/YARN-7848
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7848.001.patch
>
>
> After the addition of YARN-5366, containers will get removed after a certain 
> debug delay. However, this is a one-time effort. If the removal fails for 
> whatever reason, the container will persist. We need to add a mechanism for a 
> forced removal of those containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9417) Implement FS equivalent of AppNameMappingPlacementRule

Wilfred Spiegelenburg created YARN-9417:
---

 Summary: Implement FS equivalent of AppNameMappingPlacementRule
 Key: YARN-9417
 URL: https://issues.apache.org/jira/browse/YARN-9417
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The AppNameMappingPlacementRule is only available for the CS. We need the same 
kind of rule for the FS.
The rule should use the application name as set in the submission context.

This allows spark, mr or tez jobs to be run in their own queues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9416) Add filter options to FS placement rules



[ 
https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761
 ] 

Wilfred Spiegelenburg edited comment on YARN-9416 at 3/27/19 1:22 PM:
--

The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* filter
Name of the attributes supported for each: 
* type (_allow_ or _deny_)
* users  (comma separated list)
* groups (comma separated _ordered_ list)

The type attribute is required. One of the users and groups attributes can be 
omitted or left empty. If both are left empty the filter is ignored.
The ordering only has an impact on the secondary group rule, and thus the group 
filter, in combination with the _allow_ type. That is the only rule that has a 
loop running over a number of values that are returned in a random order by the 
OS. The order in which the list is specified will be the order in which the 
secondary groups are evaluated in the rule.

When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.


was (Author: wilfreds):
The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* userfilter
* groupfilter
Name of the attributes supported for each: 
* type (order, allow or deny)
* members (comma separated ordered list)
When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.

> Add filter options to FS placement rules
> 
>
> Key: YARN-9416
> URL: https://issues.apache.org/jira/browse/YARN-9416
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> The placement rules should allow filtering of the groups and or users that 
> match the rule.
> In the case of the user rule you might want it to only match if the users are 
> member of a specific group. An other example would be to only allow specific 
> users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9416) Add filter options to FS placement rules



[ 
https://issues.apache.org/jira/browse/YARN-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802761#comment-16802761
 ] 

Wilfred Spiegelenburg commented on YARN-9416:
-

The proposal is to add a new child entry to all rules, like the parent rule we 
have now.

Name of the xml node: 
* userfilter
* groupfilter
Name of the attributes supported for each: 
* type (order, allow or deny)
* members (comma separated ordered list)
When a rule has a filter set we check the filter before we decide if the queue 
found will be returned. This is independent of the ACLs.

> Add filter options to FS placement rules
> 
>
> Key: YARN-9416
> URL: https://issues.apache.org/jira/browse/YARN-9416
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.3.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> The placement rules should allow filtering of the groups and or users that 
> match the rule.
> In the case of the user rule you might want it to only match if the users are 
> member of a specific group. An other example would be to only allow specific 
> users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9416) Add filter options to FS placement rules

Wilfred Spiegelenburg created YARN-9416:
---

 Summary: Add filter options to FS placement rules
 Key: YARN-9416
 URL: https://issues.apache.org/jira/browse/YARN-9416
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The placement rules should allow filtering of the groups and or users that 
match the rule.

In the case of the user rule you might want it to only match if the users are 
member of a specific group. An other example would be to only allow specific 
users to match the specified rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assigning result



[ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802741#comment-16802741
 ] 

Hadoop QA commented on YARN-8793:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8793 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8793 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940551/YARN-8793.008.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23818/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> QueuePlacementPolicy bind more information to assigning result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch, YARN-8793.002.patch, 
> YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, 
> YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts



[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802731#comment-16802731
 ] 

Peter Bacsko commented on YARN-4901:


[~snemeth] [~templedf] could you look at this short patch?

I also added {{DefaultMetricsSystem.shutdown()}} because it unregisters an 
object on JMX. If we don't do this, we might get:

{noformat}
org.apache.hadoop.metrics2.MetricsException: 
org.apache.hadoop.metrics2.MetricsException: 
Hadoop:service=ResourceManager,name=RMNMInfo already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:135)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:110)
at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:123)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:64)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.(RMNMInfo.java:59)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:749)
...
{noformat}

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-4901-001.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assigning result



[ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802721#comment-16802721
 ] 

Wilfred Spiegelenburg commented on YARN-8793:
-

The PlacementRule and PlacementManager have standardised the way a chain is 
terminated and what is communicated back.

The FS has moved to using that interfaces to handle queue placements. 
Placements are handled outside the scheduler.

> QueuePlacementPolicy bind more information to assigning result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch, YARN-8793.002.patch, 
> YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, 
> YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-5387) FairScheduler: add the ability to specify a parent queue to all placement rules



 [ 
https://issues.apache.org/jira/browse/YARN-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-5387.
-
Resolution: Implemented

This has been included as part of the YARN-8967 changes.
Documentation is still outstanding and will be added as part of YARN-9415.

> FairScheduler: add the ability to specify a parent queue to all placement 
> rules
> ---
>
> Key: YARN-5387
> URL: https://issues.apache.org/jira/browse/YARN-5387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: supportability
>
> In the current placement policy there all rules generate a queue name under 
> the root. The only exception is the nestedUserQueue rule. This rule allows a 
> queue to be created under a parent queue defined by a second rule.
> Instead of creating new rules to also allow nested groups, secondary groups 
> or  nested queues for new rules that we think of we should generalise this by 
> allowing a parent attribute to be specified in each rule like the create flag.
> The optional parent attribute for a rule should allow the following values:
> - empty (which is the same as not specifying the attribute)
> - a rule
> - a fixed value (with or without the root prefix)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files



[ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802713#comment-16802713
 ] 

Hadoop QA commented on YARN-8795:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940534/YARN-8795.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23817/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch, YARN-8795.003.patch, 
> YARN-8795.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files



[ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802708#comment-16802708
 ] 

Wilfred Spiegelenburg commented on YARN-8795:
-

The rules have been moved as part of the move to a new interface. The rules are 
now all using the PlacementRule and are now all located in their own file(s).

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch, YARN-8795.003.patch, 
> YARN-8795.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy



[ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802707#comment-16802707
 ] 

Wilfred Spiegelenburg commented on YARN-8792:
-

None of these changes fit into the integrated way we currently implement the 
rules in the FS and CS.
As part of YARN-8948, YARN-9298 and finally integrated in YARN-8967 this has 
been changed. Both schedulers now use the same placement manager and placement 
rule code. The placement of the application in a queue is moved out of the FS 
also.

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues



 [ 
https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-2257.
-
Resolution: Duplicate

This has been fixed as part of YARN-8948, YARN-9298 and finally integrated in 
YARN-8967. Both schedulers use the same placement manager and placement rule 
code. The rules are different for both schedulers as the FS uses a slightly 
different setup with rule chaining and creation of queues that do not exist.

The fix is in 3.3 and later: marking this as a duplicate of YARN-8967

> Add user to queue mappings to automatically place users' apps into specific 
> queues
> --
>
> Key: YARN-2257
> URL: https://issues.apache.org/jira/browse/YARN-2257
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Patrick Liu
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
>  Labels: features
>
> Currently, the fair-scheduler supports two modes, default queue or individual 
> queue for each user.
> Apparently, the default queue is not a good option, because the resources 
> cannot be managed for each user or group.
> However, individual queue for each user is not good enough. Especially when 
> connecting yarn with hive. There will be increasing hive users in a corporate 
> environment. If we create a queue for a user, the resource management will be 
> hard to maintain.
> I think the problem can be solved like this:
> 1. Define user->queue mapping in Fair-Scheduler.xml. Inside each queue, use 
> aclSubmitApps to control user's ability.
> 2. Each time a user submit an app to yarn, if the user has mapped to a queue, 
> the app will be scheduled to that queue; otherwise, the app will be submitted 
> to default queue.
> 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4901) MockRM should clear the QueueMetrics when it starts



 [ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-4901:
---
Attachment: YARN-4901-001.patch

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-4901-001.patch
>
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4901) MockRM should clear the QueueMetrics when it starts



[ 
https://issues.apache.org/jira/browse/YARN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802695#comment-16802695
 ] 

Peter Bacsko commented on YARN-4901:


It also affects {{TestApplicationLauncher.testAMLaunchAndCleanup}}. 

> MockRM should clear the QueueMetrics when it starts
> ---
>
> Key: YARN-4901
> URL: https://issues.apache.org/jira/browse/YARN-4901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Daniel Templeton
>Assignee: Peter Bacsko
>Priority: Major
>
> The {{ResourceManager}} rightly assumes that when it starts, it's starting 
> from naught.  The {{MockRM}}, however, violates that assumption.  For 
> example, in {{TestNMReconnect}}, each test method creates a new {{MockRM}} 
> instance.  The {{QueueMetrics.queueMetrics}} field is static, which means 
> that when multiple {{MockRM}} instances are created, the {{QueueMetrics}} 
> bleed over.  Having the MockRM clear the {{QueueMetrics}} when it starts 
> should resolve the issue.  I haven't looked yet at scope to see how hard easy 
> that is to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6389) [ATSv2] metricslimit query parameter do not work



[ 
https://issues.apache.org/jira/browse/YARN-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802691#comment-16802691
 ] 

Prabhu Joseph commented on YARN-6389:
-

[~rohithsharma] Have tested metricslimit with hbase-1.2.6 and hadoop-3.2.0 and 
it is working fine.  Can you check if there is any difference in the test.

*Rest Api Output:*
{code:java}
[hbase@yarn-ats-3 centos]$ curl -s 
"http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553684950097_0001?user.name=hbase=METRICS&=YARN_APPLICATION_MEM_PREEMPT_METRIC=3;
 | jq .

"metrics": [
{
"type": "TIME_SERIES",
"id": "YARN_APPLICATION_MEM_PREEMPT_METRIC",
"aggregationOp": "NOP",
"values": {
"1553684992594": 51481,
"1553684992593": 51480,
"1553684992592": 51479
}
}
]
}

Have changed YARN_APPLICATION_MEM_PREEMPT_METRIC to TIME_SERIES type for 
testing purpose.{code}
*Hbase Shell Output:*
{code:java}
hbase(main):006:0* scan 't1'
ROW COLUMN+CELL 
row column=f1:c, timestamp=1553685646951, value=value1 
1 row(s) in 0.0740 seconds


hbase(main):010:0* scan 't1', { VERSIONS => 2}
ROW COLUMN+CELL 
row column=f1:c, timestamp=1553685646951, value=value1 
row column=f1:c, timestamp=1553685642756, value=value 
1 row(s) in 0.0150 seconds{code}
 

> [ATSv2] metricslimit query parameter do not work
> 
>
> Key: YARN-6389
> URL: https://issues.apache.org/jira/browse/YARN-6389
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Varun Saxena
>Priority: Critical
>
> It is observed that metricslimit query parameter do not work. And also by 
> default metricslimit is set to 1, all the metrics versions are retrieved. 
> One thing I noticed that even though GenericEntityReader is setting 
> Scan.setMaxVersions(), all the metrics are retrieved. It appears something 
> wrong in TimelieneWritter or the way hbase filters is being used. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9269) Minor cleanup in FpgaResourceAllocator



[ 
https://issues.apache.org/jira/browse/YARN-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802618#comment-16802618
 ] 

Peter Bacsko commented on YARN-9269:


[~devaraj.k] [~sunilg] Could you guys review + commit it if it looks good? 
Thanks!

> Minor cleanup in FpgaResourceAllocator
> --
>
> Key: YARN-9269
> URL: https://issues.apache.org/jira/browse/YARN-9269
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9269-001.patch, YARN-9269-002.patch, 
> YARN-9269-003.patch, YARN-9269-004.patch, YARN-9269-005.patch
>
>
> Some stuff that we observed:
>  * {{addFpga()}} - we check for duplicate devices, but we don't print any 
> error/warning if there's any.
>  * {{findMatchedFpga()}} should be called {{findMatchingFpga()}}. Also, is 
> this method even needed? We already receive an {{FpgaDevice}} instance in 
> {{updateFpga()}} which I believe is the same that we're looking up.
>  * variable {{IPIDpreference}} is confusing
>  * {{availableFpga}} / {{usedFpgaByRequestor}} are instances of 
> {{LinkedHashMap}}. What's the rationale behind this? Doesn't a simple 
> {{HashMap}} suffice?
>  * {{usedFpgaByRequestor}} should be renamed, naming is a bit unclear
>  * {{allowedFpgas}} should be an immutable list
>  * {{@VisibleForTesting}} methods should be package private
>  * get rid of {{*}} imports



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9415) Document FS placement rule changes from YARN-8967

Wilfred Spiegelenburg created YARN-9415:
---

 Summary: Document FS placement rule changes from YARN-8967
 Key: YARN-9415
 URL: https://issues.apache.org/jira/browse/YARN-9415
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation, fairscheduler
Affects Versions: 3.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


With the changes introduced by YARN-8967 we now allow parent rules on all 
existing rules. This should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6567) Flexible Workload Management



 [ 
https://issues.apache.org/jira/browse/YARN-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-6567:
---

Assignee: Wilfred Spiegelenburg

> Flexible Workload Management
> 
>
> Key: YARN-6567
> URL: https://issues.apache.org/jira/browse/YARN-6567
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ajai Omtri
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
>  Labels: features
>
> Yarn workload management can be little more dynamic. 
> 1. Create yarn pool by specifying more than one Secondary AD group. 
> Scenario: 
> In a multi-tenant cluster there can be hundreds of AD groups per tenant and 
> hundreds of users per AD group. We want a way to group like workloads into 
> single yarn pool by specifying multiple secondary AD Groups. 
> Ex: All the ETL workloads of tenants needs to go into one yarn pool. This 
> requires addition of all ETL related AD groups into one yarn pool. 
> 2. Demotions
> Scenario: A particular workload/job has been started in a high priority yarn 
> pool based on the assumption that it would finish quickly but due to some 
> data issue/change in the code/query etc. - now it is running longer and 
> consuming high amounts of resources for long time. In this case we want to 
> demote this to a lower resource allocated yarn pool. We don’t want this one 
> run-away workload/job to dominate the cluster because our assumption was 
> wrong.
> Ex: If any workload in the yarn pool runs for X minutes and/or consumes Y 
> resources either alert me or push to another yarn pool. User can keep 
> demoting and can push to a yarn pool which has capped resources - like 
> Penalty box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler