[jira] [Commented] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17433076#comment-17433076 ] Ahmed Hussein commented on YARN-1115: - Thanks [~epayne] for working on the fix and providing the patches for the affected branches. I committed the patches to all the branches and marked the jira as fixed. > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-2.10.004.patch, > YARN-1115.branch-3.2.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-1115: Fix Version/s: 2.10.2 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-2.10.004.patch, > YARN-1115.branch-3.2.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-1115: Fix Version/s: 3.2.4 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-2.10.004.patch, > YARN-1115.branch-3.2.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-1115: Fix Version/s: 3.3.2 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432531#comment-17432531 ] Ahmed Hussein edited comment on YARN-1115 at 10/21/21, 2:47 PM: I was not able to build branch-3.3 hitting HADOOP-17650 [~epayne] were you able to build branch-3.3 locally? was (Author: ahussein): I was not able to build branch-3.3 hitting HADOOP-17650 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432531#comment-17432531 ] Ahmed Hussein commented on YARN-1115: - I was not able to build branch-3.3 hitting HADOOP-17650 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432135#comment-17432135 ] Ahmed Hussein commented on YARN-1115: - Thanks [~epayne] for providing the patch. I merged 004 into trunk. > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-1115: Fix Version/s: 3.4.0 > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429324#comment-17429324 ] Ahmed Hussein commented on YARN-1115: - Thanks [~epayne]! For YARN-1115.002.patch, * there are some checkstyle errors. * I guess cc and javac errors are unrelated. I thought that patch submission had some issues since all support went to Github precommit jobs. > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Attachments: YARN-1115.001.patch, YARN-1115.002.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427998#comment-17427998 ] Ahmed Hussein commented on YARN-1115: - [~epayne], Thanks for providing the patch! The changes have been successfully applied internally for quite some time. +1 P.S: I am suspicious that the test-patch is not reporting checkstyle correctly. So, I would suggest to resubmit another patch after rebasing, or creating a new PR. > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Priority: Major > Attachments: YARN-1115.001.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414542#comment-17414542 ] Ahmed Hussein commented on YARN-10935: -- Thanks [~epayne] for the fix. I find that the one-liner fix is clever and minimalist. +1 (non-binding) > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: Screen Shot 2021-09-07 at 12.49.52 PM.png, Screen Shot > 2021-09-07 at 12.55.37 PM.png, YARN-10935.001.patch, YARN-10935.002.patch, > YARN-10935.003.patch > > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10566) Elapsed time should be measured monotonicNow
[ https://issues.apache.org/jira/browse/YARN-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein resolved YARN-10566. -- Release Note: see discussions in HADOOP-15901 Resolution: Won't Fix > Elapsed time should be measured monotonicNow > > > Key: YARN-10566 > URL: https://issues.apache.org/jira/browse/YARN-10566 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > I noticed that there is a widespread incorrect usage of > {{System.currentTimeMillis()}} throughout the yarn code. > For example: > {code:java} > // Some comments here > long start = System.currentTimeMillis(); > while (System.currentTimeMillis() - start < timeout) { > // Do something > } > {code} > Elapsed time should be measured using `monotonicNow()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10733) TimelineService Hbase tests are failing with timeout error on branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322426#comment-17322426 ] Ahmed Hussein commented on YARN-10733: -- Thanks for the review and for committing the PR! > TimelineService Hbase tests are failing with timeout error on branch-2.10 > - > > Key: YARN-10733 > URL: https://issues.apache.org/jira/browse/YARN-10733 > Project: Hadoop YARN > Issue Type: Bug > Components: test, timelineserver, yarn >Affects Versions: 2.10.0 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 2.10.2 > > Attachments: 2021-04-12T12-40-21_403-jvmRun1.dump, > 2021-04-12T12-40-58_857.dumpstream, > org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction-output.txt.zip > > Time Spent: 0.5h > Remaining Estimate: 0h > > {code:bash} > 03:54:41 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on > project hadoop-yarn-server-timelineservice-hbase-tests: There was a timeout > or other error in the fork -> [Help 1] > 03:54:41 [ERROR] > 03:54:41 [ERROR] To see the full stack trace of the errors, re-run Maven with > the -e switch. > 03:54:41 [ERROR] Re-run Maven using the -X switch to enable full debug > logging. > 03:54:41 [ERROR] > 03:54:41 [ERROR] For more information about the errors and possible > solutions, please read the following articles: > 03:54:41 [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > 03:54:41 [ERROR] > 03:54:41 [ERROR] After correcting the problems, you can resume the build with > the command > 03:54:41 [ERROR] mvn -rf > :hadoop-yarn-server-timelineservice-hbase-tests > {code} > Failure of the tests is due to test unit > {{TestHBaseStorageFlowRunCompaction}} getting stuck. > Upon checking the surefire reports, I found several Class no Found Exceptions. > {code:bash} > Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.(StoreFileInfo.java:66) > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 33 more > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanUnbuffer > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 51 more > {code} > and > {code:bash} > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.regionserver.StoreFileInfo > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 10 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
[jira] [Updated] (YARN-10733) TimelineService Hbase tests are failing with timeout error on branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10733: - Affects Version/s: (was: 2.10.2) 2.10.0 > TimelineService Hbase tests are failing with timeout error on branch-2.10 > - > > Key: YARN-10733 > URL: https://issues.apache.org/jira/browse/YARN-10733 > Project: Hadoop YARN > Issue Type: Bug > Components: test, timelineserver, yarn >Affects Versions: 2.10.0 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: 2021-04-12T12-40-21_403-jvmRun1.dump, > 2021-04-12T12-40-58_857.dumpstream, > org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction-output.txt.zip > > > {code:bash} > 03:54:41 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on > project hadoop-yarn-server-timelineservice-hbase-tests: There was a timeout > or other error in the fork -> [Help 1] > 03:54:41 [ERROR] > 03:54:41 [ERROR] To see the full stack trace of the errors, re-run Maven with > the -e switch. > 03:54:41 [ERROR] Re-run Maven using the -X switch to enable full debug > logging. > 03:54:41 [ERROR] > 03:54:41 [ERROR] For more information about the errors and possible > solutions, please read the following articles: > 03:54:41 [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > 03:54:41 [ERROR] > 03:54:41 [ERROR] After correcting the problems, you can resume the build with > the command > 03:54:41 [ERROR] mvn -rf > :hadoop-yarn-server-timelineservice-hbase-tests > {code} > Failure of the tests is due to test unit > {{TestHBaseStorageFlowRunCompaction}} getting stuck. > Upon checking the surefire reports, I found several Class no Found Exceptions. > {code:bash} > Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.(StoreFileInfo.java:66) > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 33 more > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanUnbuffer > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 51 more > {code} > and > {code:bash} > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.regionserver.StoreFileInfo > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 10 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10733) TimelineService Hbase tests are failing with timeout error on branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10733: - Affects Version/s: 2.10.2 > TimelineService Hbase tests are failing with timeout error on branch-2.10 > - > > Key: YARN-10733 > URL: https://issues.apache.org/jira/browse/YARN-10733 > Project: Hadoop YARN > Issue Type: Bug > Components: test, timelineserver, yarn >Affects Versions: 2.10.2 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: 2021-04-12T12-40-21_403-jvmRun1.dump, > 2021-04-12T12-40-58_857.dumpstream, > org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction-output.txt.zip > > > {code:bash} > 03:54:41 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on > project hadoop-yarn-server-timelineservice-hbase-tests: There was a timeout > or other error in the fork -> [Help 1] > 03:54:41 [ERROR] > 03:54:41 [ERROR] To see the full stack trace of the errors, re-run Maven with > the -e switch. > 03:54:41 [ERROR] Re-run Maven using the -X switch to enable full debug > logging. > 03:54:41 [ERROR] > 03:54:41 [ERROR] For more information about the errors and possible > solutions, please read the following articles: > 03:54:41 [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > 03:54:41 [ERROR] > 03:54:41 [ERROR] After correcting the problems, you can resume the build with > the command > 03:54:41 [ERROR] mvn -rf > :hadoop-yarn-server-timelineservice-hbase-tests > {code} > Failure of the tests is due to test unit > {{TestHBaseStorageFlowRunCompaction}} getting stuck. > Upon checking the surefire reports, I found several Class no Found Exceptions. > {code:bash} > Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.(StoreFileInfo.java:66) > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 33 more > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanUnbuffer > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 51 more > {code} > and > {code:bash} > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.regionserver.StoreFileInfo > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 10 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10733) TimelineService Hbase tests are failing with timeout error on branch-2.10
[ https://issues.apache.org/jira/browse/YARN-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reassigned YARN-10733: Assignee: Ahmed Hussein > TimelineService Hbase tests are failing with timeout error on branch-2.10 > - > > Key: YARN-10733 > URL: https://issues.apache.org/jira/browse/YARN-10733 > Project: Hadoop YARN > Issue Type: Bug > Components: test, timelineserver, yarn >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: 2021-04-12T12-40-21_403-jvmRun1.dump, > 2021-04-12T12-40-58_857.dumpstream, > org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction-output.txt.zip > > > {code:bash} > 03:54:41 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on > project hadoop-yarn-server-timelineservice-hbase-tests: There was a timeout > or other error in the fork -> [Help 1] > 03:54:41 [ERROR] > 03:54:41 [ERROR] To see the full stack trace of the errors, re-run Maven with > the -e switch. > 03:54:41 [ERROR] Re-run Maven using the -X switch to enable full debug > logging. > 03:54:41 [ERROR] > 03:54:41 [ERROR] For more information about the errors and possible > solutions, please read the following articles: > 03:54:41 [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > 03:54:41 [ERROR] > 03:54:41 [ERROR] After correcting the problems, you can resume the build with > the command > 03:54:41 [ERROR] mvn -rf > :hadoop-yarn-server-timelineservice-hbase-tests > {code} > Failure of the tests is due to test unit > {{TestHBaseStorageFlowRunCompaction}} getting stuck. > Upon checking the surefire reports, I found several Class no Found Exceptions. > {code:bash} > Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.hbase.regionserver.StoreFileInfo.(StoreFileInfo.java:66) > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 33 more > Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanUnbuffer > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 51 more > {code} > and > {code:bash} > Caused by: java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.regionserver.StoreFileInfo > at > org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) > at > org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) > ... 10 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10733) TimelineService Hbase tests are failing with timeout error on branch-2.10
Ahmed Hussein created YARN-10733: Summary: TimelineService Hbase tests are failing with timeout error on branch-2.10 Key: YARN-10733 URL: https://issues.apache.org/jira/browse/YARN-10733 Project: Hadoop YARN Issue Type: Bug Components: test, timelineserver, yarn Reporter: Ahmed Hussein Attachments: 2021-04-12T12-40-21_403-jvmRun1.dump, 2021-04-12T12-40-58_857.dumpstream, org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction-output.txt.zip {code:bash} 03:54:41 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.2:test (default-test) on project hadoop-yarn-server-timelineservice-hbase-tests: There was a timeout or other error in the fork -> [Help 1] 03:54:41 [ERROR] 03:54:41 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. 03:54:41 [ERROR] Re-run Maven using the -X switch to enable full debug logging. 03:54:41 [ERROR] 03:54:41 [ERROR] For more information about the errors and possible solutions, please read the following articles: 03:54:41 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException 03:54:41 [ERROR] 03:54:41 [ERROR] After correcting the problems, you can resume the build with the command 03:54:41 [ERROR] mvn -rf :hadoop-yarn-server-timelineservice-hbase-tests {code} Failure of the tests is due to test unit {{TestHBaseStorageFlowRunCompaction}} getting stuck. Upon checking the surefire reports, I found several Class no Found Exceptions. {code:bash} Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.(StoreFileInfo.java:66) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) at org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) ... 33 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanUnbuffer at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 51 more {code} and {code:bash} Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.regionserver.StoreFileInfo at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:698) at org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1895) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1009) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2638) ... 10 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309080#comment-17309080 ] Ahmed Hussein commented on YARN-10501: -- findbugs is not supported. We need to pull HADOOP-16870 into branch-2.10. https://issues.apache.org/jira/browse/HADOOP-16870?focusedCommentId=17309077=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17309077 > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308884#comment-17308884 ] Ahmed Hussein commented on YARN-10597: -- Thanks [~shuzirra] for the fix. +1 (non-binding) > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch, YARN-10597.002.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305024#comment-17305024 ] Ahmed Hussein commented on YARN-10597: -- That's interesting. I ran the unit tests in YARN-10425 from intellij and they all passed. Just a quick question. In {{CSMappingPlacementRule.java}} aren't we suppose to pass the configuration object to {{Groups.getUserToGroupsMappingService}} ? I am considering the case when the singleton was not initialized. In that case {{Groups.getUserToGroupsMappingService}} won't parse the parameters {{HADOOP_SECURITY_GROUP_MAPPING}} set inside {{conf}} {code:java} - groups = Groups.getUserToGroupsMappingService(); + groups = Groups.getUserToGroupsMappingService(conf); {code} > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304445#comment-17304445 ] Ahmed Hussein commented on YARN-10597: -- Thanks [~shuzirra] for the patch. It is fine to ignore the error of the init tests. It should be fine to enough to verify against the tests affected by YARN-10425. I am (+1 non-binding) > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302817#comment-17302817 ] Ahmed Hussein edited comment on YARN-10501 at 3/16/21, 6:55 PM: That's confusing. I am sure [~aajisaka] has better clue. branch-2.10 dev-support/Jenkinsfile defines {{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does {{--spotbugs-strict-precheck}} come from on branch-2.10 builds. was (Author: ahussein): That's confusing. I am sure [~aajisaka] has better clue. branch-2.10 -> dev-support/Jenkinsfile defines {{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does {{--spotbugs-strict-precheck}} come from on branch-2.10 builds. > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, > YARN-10502-branch-2.10.003.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302817#comment-17302817 ] Ahmed Hussein commented on YARN-10501: -- That's confusing. I am sure [~aajisaka] has better clue. branch-2.10 -> dev-support/Jenkinsfile defines {{YETUS_ARGS+=("--findbugs-strict-precheck")}}. I do not know where does {{--spotbugs-strict-precheck}} come from on branch-2.10 builds. > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Fix For: 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10501-branch-2.10.001.patch, YARN-10501.002.patch, > YARN-10501.003.patch, YARN-10501.004.patch, YARN-10502-branch-2.10.002.patch, > YARN-10502-branch-2.10.003.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10585) Create a class which can convert from legacy mapping rule format to the new JSON format
[ https://issues.apache.org/jira/browse/YARN-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278425#comment-17278425 ] Ahmed Hussein commented on YARN-10585: -- The process to deal with those fixes is not defined in the community. Therefore, it is done by personal styling and preference. My point regarding the difference between reverting Vs filing-new-jira: * Yetus analyses the code based on the diff. This means that splitting the PR into two phases implies that the UTs and the code analysis have not been done on the whole changes together. These are couple of 2 sample examples for such cases: ** Take YARN-10352 which were committed with two findbugs errors. Both errors were lost because the report expired. The followup Jira YARN-10611 that was supposed to fix an import, shows only one findbugs report. ** Another example: if the follow-up Jira does not touch UT files, then Yetus won't trigger the tests cases. If the follow-up fixes break the unit tests, Yetus won't detect that leading to the merge of the broken code. * While I agree that findbugs/checkstyles reports have a lot of false-positives, they occasionally point out to bugs. This was the case with YARN-10352 which breaks the Hadoop dependencies. * In the last couple of weeks, there were at least 3 code merges with Yetus errors, with the first one being breaking the dependencies of Guava: 1) YARN-10352 - YARN-10611 , 2) YARN-10574 -YARN-10506 , 3) YARN-10585- YARN-10612 . > Create a class which can convert from legacy mapping rule format to the new > JSON format > --- > > Key: YARN-10585 > URL: https://issues.apache.org/jira/browse/YARN-10585 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10585.001.patch, YARN-10585.002.patch, > YARN-10585.003.patch > > > To make transition easier we need to create tooling to support the migration > effort. The first step is to create a class which can migrate from legacy to > the new JSON format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10612) Fix find bugs issue introduced in YARN-10585
[ https://issues.apache.org/jira/browse/YARN-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278384#comment-17278384 ] Ahmed Hussein commented on YARN-10612: -- I am ok with submitting the fix as a separate Jira as mentioned in YARN-10585 > Fix find bugs issue introduced in YARN-10585 > > > Key: YARN-10612 > URL: https://issues.apache.org/jira/browse/YARN-10612 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10612.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10611) Fix that shaded should be used for google guava imports in YARN-10352.
[ https://issues.apache.org/jira/browse/YARN-10611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278383#comment-17278383 ] Ahmed Hussein commented on YARN-10611: -- Thanks [~zhuqi]! Can you please fix the windbags error and confirm whether {{TestDelegationTokenRenewer}} failure is related to the changes? > Fix that shaded should be used for google guava imports in YARN-10352. > -- > > Key: YARN-10611 > URL: https://issues.apache.org/jira/browse/YARN-10611 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10611.001.patch > > > Fix that shaded should be used for google guava imports in YARN-10352. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10352: - Comment: was deleted (was: The problem that at any point we have more than one commit for each main Jura-ticket. This makes it hard to go between revisions without breaking the build. I suggest that the fixes are amended to the original commit and close YARN-10611. Like revert and recommit a patch that does not generate Yetus. Please make sure that the patch passes Yetus before merging. ) > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein resolved YARN-10352. -- Resolution: Fixed > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10585) Create a class which can convert from legacy mapping rule format to the new JSON format
[ https://issues.apache.org/jira/browse/YARN-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278381#comment-17278381 ] Ahmed Hussein commented on YARN-10585: -- Thank you [~shuzirra] and [~snemeth] for the clarification. [~snemeth] Sorry that I sounded negative and I did not word my comment the best way. I did not mean to comment on the quality of the work. What I meant was that the credibility of the process will diminish when it becomes a habit. I am confident you have verified the patch and the UTs. I believe you have a good point to keep this Jira as resolved while fixing the issue in YARN-10612. Apologies for reopening this Jira. > Create a class which can convert from legacy mapping rule format to the new > JSON format > --- > > Key: YARN-10585 > URL: https://issues.apache.org/jira/browse/YARN-10585 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10585.001.patch, YARN-10585.002.patch, > YARN-10585.003.patch > > > To make transition easier we need to create tooling to support the migration > effort. The first step is to create a class which can migrate from legacy to > the new JSON format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10585) Create a class which can convert from legacy mapping rule format to the new JSON format
[ https://issues.apache.org/jira/browse/YARN-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein resolved YARN-10585. -- Resolution: Fixed > Create a class which can convert from legacy mapping rule format to the new > JSON format > --- > > Key: YARN-10585 > URL: https://issues.apache.org/jira/browse/YARN-10585 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10585.001.patch, YARN-10585.002.patch, > YARN-10585.003.patch > > > To make transition easier we need to create tooling to support the migration > effort. The first step is to create a class which can migrate from legacy to > the new JSON format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-10585) Create a class which can convert from legacy mapping rule format to the new JSON format
[ https://issues.apache.org/jira/browse/YARN-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reopened YARN-10585: -- > Create a class which can convert from legacy mapping rule format to the new > JSON format > --- > > Key: YARN-10585 > URL: https://issues.apache.org/jira/browse/YARN-10585 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10585.001.patch, YARN-10585.002.patch, > YARN-10585.003.patch > > > To make transition easier we need to create tooling to support the migration > effort. The first step is to create a class which can migrate from legacy to > the new JSON format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10585) Create a class which can convert from legacy mapping rule format to the new JSON format
[ https://issues.apache.org/jira/browse/YARN-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278258#comment-17278258 ] Ahmed Hussein commented on YARN-10585: -- Thanks [~shuzirra] and [~snemeth] for the contribution. I am reopening this jira as it was merged with Yetus failures. For future code mergse and commits, please make sure that the patch/PR does not generate Yetus errors before merging. It is not scalable to have several Jiras filed just to fix checkstyle, and findbugs. In addition to the fact that this raises doubts on the patch credibility overall; eventually, this causes a flood of commits and difficulty reverting commits leading to unstable code repository. > Create a class which can convert from legacy mapping rule format to the new > JSON format > --- > > Key: YARN-10585 > URL: https://issues.apache.org/jira/browse/YARN-10585 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10585.001.patch, YARN-10585.002.patch, > YARN-10585.003.patch > > > To make transition easier we need to create tooling to support the migration > effort. The first step is to create a class which can migrate from legacy to > the new JSON format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10612) Fix find bugs issue introduced in YARN-10585
[ https://issues.apache.org/jira/browse/YARN-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278251#comment-17278251 ] Ahmed Hussein commented on YARN-10612: -- Hey [~shuzirra], can you close this jira and address the findbugs and the checkstyle errors generated by the code change in the same Jira YARN-10585? I understand that this is more work to amend changes to the merged code, but the merge should have not gone through with errors in the Yetus report. It is inconvenient for developers to navigate through Jiras and code revisions when there are such dependencies between commits. At any point rolling back a feature would require building a chain of multiple commits that consist a single ticket. > Fix find bugs issue introduced in YARN-10585 > > > Key: YARN-10612 > URL: https://issues.apache.org/jira/browse/YARN-10612 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Priority: Major > Attachments: YARN-10612.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278244#comment-17278244 ] Ahmed Hussein edited comment on YARN-10352 at 2/3/21, 5:39 PM: --- The problem that at any point we have more than one commit for each main Jura-ticket. This makes it hard to go between revisions without breaking the build. I suggest that the fixes are amended to the original commit and close YARN-10611. Like revert and recommit a patch that does not generate Yetus. Please make sure that the patch passes Yetus before merging. was (Author: ahussein): The problem that at any point we have more than one commit for each main Jura-ticket. This makes it hard to go between revisions without breaking the build. I suggest that the fixes are amended to the original commit and close YARN-10611. Like revert and recommit a patch that does not generate errors by Yetus. Please make sure that the patch passes Yetus before merging. > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278244#comment-17278244 ] Ahmed Hussein commented on YARN-10352: -- The problem that at any point we have more than one commit for each main Jura-ticket. This makes it hard to go between revisions without breaking the build. I suggest that the fixes are amended to the original commit and close YARN-10611. Like revert and recommit a patch that does not generate errors by Yetus. Please make sure that the patch passes Yetus before merging. > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278217#comment-17278217 ] Ahmed Hussein commented on YARN-10352: -- Thanks [~zhuqi] for the prompt response. Do you know what were the findbugs errors reported by Yetus on January 20th? It could be awesome to fix those along in YARN-10611. > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reopened YARN-10352: -- > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10352) Skip schedule on not heartbeated nodes in Multi Node Placement
[ https://issues.apache.org/jira/browse/YARN-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278108#comment-17278108 ] Ahmed Hussein commented on YARN-10352: -- [~bibinchundatt] and [~ztang]. The patch introduces a guava import. Can you please submit a followup to this patch fixing the guava import in [TestCapacitySchedulerMultiNodes-L#28|https://github.com/apache/hadoop/commit/6fc26ad5392a2a61ace60b88ed931fed3859365d#diff-34d534eb66cd9af6d7c47a9f643d598b1ad4cef3453219457769e92fbd4a649dR28] ? > Skip schedule on not heartbeated nodes in Multi Node Placement > -- > > Key: YARN-10352 > URL: https://issues.apache.org/jira/browse/YARN-10352 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0, 3.4.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: capacityscheduler, multi-node-placement > Fix For: 3.4.0 > > Attachments: YARN-10352-001.patch, YARN-10352-002.patch, > YARN-10352-003.patch, YARN-10352-004.patch, YARN-10352-005.patch, > YARN-10352-006.patch, YARN-10352-007.patch, YARN-10352-008.patch, > YARN-10352-010.patch, YARN-10352.009.patch > > > When Node Recovery is Enabled, Stopping a NM won't unregister to RM. So RM > Active Nodes will be still having those stopped nodes until NM Liveliness > Monitor Expires after configured timeout > (yarn.nm.liveness-monitor.expiry-interval-ms = 10 mins). During this 10mins, > Multi Node Placement assigns the containers on those nodes. They need to > exclude the nodes which has not heartbeated for configured heartbeat interval > (yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000ms) similar to > Asynchronous Capacity Scheduler Threads. > (CapacityScheduler#shouldSkipNodeSchedule) > *Repro:* > 1. Enable Multi Node Placement > (yarn.scheduler.capacity.multi-node-placement-enabled) + Node Recovery > Enabled (yarn.node.recovery.enabled) > 2. Have only one NM running say worker0 > 3. Stop worker0 and start any other NM say worker1 > 4. Submit a sleep job. The containers will timeout as assigned to stopped NM > worker0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10425) Replace the legacy placement engine in CS with the new one
[ https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270425#comment-17270425 ] Ahmed Hussein edited comment on YARN-10425 at 1/22/21, 8:57 PM: [~shuzirra], [~pbacsko], [~snemeth], [~BilwaST] thanks for the contribution. I have a question about the changes introduced by this Ticket. The following code block is from [CSMappingPlacementRule#L128|https://github.com/apache/hadoop/commit/567600fd80896c1c9b0db1f228368d4eb2a694a2#diff-92b5797cf7739d330364d967172e65e61a859c776d9ebe526aba03ea33039033R127] {code:java} if (groups == null) { //We cannot use Groups#getUserToGroupsMappingService here, because when //tests change the HADOOP_SECURITY_GROUP_MAPPING, Groups won't refresh its //cached instance of groups, so we might get a Group instance which //ignores the HADOOP_SECURITY_GROUP_MAPPING settings. groups = new Groups(conf); } {code} IIUC, the design of groups caching "{{Groups.cache}}" relies on the fact that the Groups being a singleton. Otherwise, there will be inconsistent behavior especially in classes like {{JniBasedUnixGroupsNetgroupMapping}} and {{ShellBasedUnixGroupsNetgroupMapping}}. Both mapping implementations have a second caching layer for the netgroups "{{NetgroupCache}}". I have the following two concerns regarding an independent Groups instance in {{CSMappingPlacementRule.java}} * It breaks the design leading to inconsistent behaviors that do not match the expected. As I mentioned, {{NetgroupCache}} contents won't be defined. * Performance considerations. Allocating "N" instances of {{Groups}} means fetching the user's groups "N" times. Therefore, Guava cacheLoader's refresh will be done "N" times, and so on. Why did you decide to make that change instead of fixing the design of the unit tests? IIUC, there is a need to fix that bug in a follow up Jira. was (Author: ahussein): [~shuzirra], [~pbacsko] thanks for the contribution. I have a question about the changes introduced by this Ticket. The following code block is from [CSMappingPlacementRule#L128|https://github.com/apache/hadoop/commit/567600fd80896c1c9b0db1f228368d4eb2a694a2#diff-92b5797cf7739d330364d967172e65e61a859c776d9ebe526aba03ea33039033R127] {code:java} if (groups == null) { //We cannot use Groups#getUserToGroupsMappingService here, because when //tests change the HADOOP_SECURITY_GROUP_MAPPING, Groups won't refresh its //cached instance of groups, so we might get a Group instance which //ignores the HADOOP_SECURITY_GROUP_MAPPING settings. groups = new Groups(conf); } {code} IIUC, the design of groups caching "{{Groups.cache}}" relies on the fact that the Groups being a singleton. Otherwise, there will be inconsistent behavior especially in classes like {{JniBasedUnixGroupsNetgroupMapping}} and {{ShellBasedUnixGroupsNetgroupMapping}}. Both mapping implementations have a second caching layer for the netgroups "{{NetgroupCache}}". I have the following two concerns regarding an independent Groups instance in {{CSMappingPlacementRule.java}} * It breaks the design leading to inconsistent behaviors that do not match the expected. As I mentioned, {{NetgroupCache}} contents won't be defined. * Performance considerations. Allocating "N" instances of {{Groups}} means fetching the user's groups "N" times. Therefore, Guava cacheLoader's refresh will be done "N" times, and so on. Why did you decide to make that change instead of fixing the design of the unit tests? IIUC, there is a need to fix that bug in a follow up Jira. > Replace the legacy placement engine in CS with the new one > -- > > Key: YARN-10425 > URL: https://issues.apache.org/jira/browse/YARN-10425 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10425.001.patch, YARN-10425.002.patch, > YARN-10425.003.patch, YARN-10425.004.patch, YARN-10425.005.patch, > YARN-10425.006.patch, YARN-10425.007.patch > > > Remove the UserGroupMapping and ApplicationName mapping classes, and use the > new CSMappingPlacementRule instead. Also cleanup the orphan classes which are > used by these classes only. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10425) Replace the legacy placement engine in CS with the new one
[ https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270425#comment-17270425 ] Ahmed Hussein commented on YARN-10425: -- [~shuzirra], [~pbacsko] thanks for the contribution. I have a question about the changes introduced by this Ticket. The following code block is from [CSMappingPlacementRule#L128|https://github.com/apache/hadoop/commit/567600fd80896c1c9b0db1f228368d4eb2a694a2#diff-92b5797cf7739d330364d967172e65e61a859c776d9ebe526aba03ea33039033R127] {code:java} if (groups == null) { //We cannot use Groups#getUserToGroupsMappingService here, because when //tests change the HADOOP_SECURITY_GROUP_MAPPING, Groups won't refresh its //cached instance of groups, so we might get a Group instance which //ignores the HADOOP_SECURITY_GROUP_MAPPING settings. groups = new Groups(conf); } {code} IIUC, the design of groups caching "{{Groups.cache}}" relies on the fact that the Groups being a singleton. Otherwise, there will be inconsistent behavior especially in classes like {{JniBasedUnixGroupsNetgroupMapping}} and {{ShellBasedUnixGroupsNetgroupMapping}}. Both mapping implementations have a second caching layer for the netgroups "{{NetgroupCache}}". I have the following two concerns regarding an independent Groups instance in {{CSMappingPlacementRule.java}} * It breaks the design leading to inconsistent behaviors that do not match the expected. As I mentioned, {{NetgroupCache}} contents won't be defined. * Performance considerations. Allocating "N" instances of {{Groups}} means fetching the user's groups "N" times. Therefore, Guava cacheLoader's refresh will be done "N" times, and so on. Why did you decide to make that change instead of fixing the design of the unit tests? IIUC, there is a need to fix that bug in a follow up Jira. > Replace the legacy placement engine in CS with the new one > -- > > Key: YARN-10425 > URL: https://issues.apache.org/jira/browse/YARN-10425 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10425.001.patch, YARN-10425.002.patch, > YARN-10425.003.patch, YARN-10425.004.patch, YARN-10425.005.patch, > YARN-10425.006.patch, YARN-10425.007.patch > > > Remove the UserGroupMapping and ApplicationName mapping classes, and use the > new CSMappingPlacementRule instead. Also cleanup the orphan classes which are > used by these classes only. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10568) TestTimelineClient#testTimelineClientCleanup fails on trunk
Ahmed Hussein created YARN-10568: Summary: TestTimelineClient#testTimelineClientCleanup fails on trunk Key: YARN-10568 URL: https://issues.apache.org/jira/browse/YARN-10568 Project: Hadoop YARN Issue Type: Bug Components: timelineclient Reporter: Ahmed Hussein {{TestTimelineClient.testTimelineClientCleanup}} gives a NPE on trunk {code:bash} java.lang.NullPointerException at org.apache.hadoop.yarn.client.api.impl.TestTimelineClient.testTimelineClientCleanup(TestTimelineClient.java:483) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10566) Elapsed time should be measured monotonicNow
Ahmed Hussein created YARN-10566: Summary: Elapsed time should be measured monotonicNow Key: YARN-10566 URL: https://issues.apache.org/jira/browse/YARN-10566 Project: Hadoop YARN Issue Type: Bug Reporter: Ahmed Hussein Assignee: Ahmed Hussein I noticed that there is a widespread incorrect usage of {{System.currentTimeMillis()}} throughout the yarn code. For example: {code:java} // Some comments here long start = System.currentTimeMillis(); while (System.currentTimeMillis() - start < timeout) { // Do something } {code} Elapsed time should be measured using `monotonicNow()`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10553) Refactor TestDistributedShell
[ https://issues.apache.org/jira/browse/YARN-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10553: - Description: TestDistributedShell has grown so large over time. It has 29 tests. This is running the risk of exceeding 30 minutes limit for a single unit class. * The implementation has lots of code redundancy. * The Jira splits TestDistributedShell into three different unitTest for each TimeLineVersion: V1.0, 1.5, and 2.0 * Fixes the broken test {{testDSShellWithEnforceExecutionType}} was: TestDistributedShell has grown so large over time. It has 29 tests. This is ru inning the risk of exceeding 30 minutes limit for a single unit class. * The implementation has lots of code redundancy. * It is inefficient in the setup and tearing down. The large percentage of time execution is exhausted by starting cluster and stopping the services. > Refactor TestDistributedShell > - > > Key: YARN-10553 > URL: https://issues.apache.org/jira/browse/YARN-10553 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available, refactoring, test > Time Spent: 5h > Remaining Estimate: 0h > > TestDistributedShell has grown so large over time. It has 29 tests. > This is running the risk of exceeding 30 minutes limit for a single unit > class. > * The implementation has lots of code redundancy. > * The Jira splits TestDistributedShell into three different unitTest for > each TimeLineVersion: V1.0, 1.5, and 2.0 > * Fixes the broken test {{testDSShellWithEnforceExecutionType}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260098#comment-17260098 ] Ahmed Hussein edited comment on YARN-10040 at 1/6/21, 10:59 PM: Thanks [~iwasakims] for fixing {{testDSShellWithOpportunisticContainers}}! I found the fix to \{{testDSShellWithEnforceExecutionType}}. It is part of the [PR-2581|https://github.com/apache/hadoop/pull/2581]. See the description of the bug in the unit test in my [comment-pr-2581|https://github.com/apache/hadoop/pull/2581#issuecomment-755765315] was (Author: ahussein): Thanks [~iwasakims] for fixing {{testDSShellWithOpportunisticContainers}}! I found the fix to{{ testDSShellWithEnforceExecutionType}}. It is part of the [PR-2581|https://github.com/apache/hadoop/pull/2581]. See the description of the bug in the unit test in my [comment-pr-2581|https://github.com/apache/hadoop/pull/2581#issuecomment-755765315] > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260098#comment-17260098 ] Ahmed Hussein commented on YARN-10040: -- Thanks [~iwasakims] for fixing {{testDSShellWithOpportunisticContainers}}! I found the fix to{{ testDSShellWithEnforceExecutionType}}. It is part of the [PR-2581|https://github.com/apache/hadoop/pull/2581]. See the description of the bug in the unit test in my [comment-pr-2581|https://github.com/apache/hadoop/pull/2581#issuecomment-755765315] > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10556) Web-app server does not work for Timeline V2
[ https://issues.apache.org/jira/browse/YARN-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10556: - Summary: Web-app server does not work for Timeline V2 (was: Web-app server does not work for V2 timeline) > Web-app server does not work for Timeline V2 > > > Key: YARN-10556 > URL: https://issues.apache.org/jira/browse/YARN-10556 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Ahmed Hussein >Priority: Major > > {{TestDistributedShell}} for timeline version 2.0 shows the following errors > in the log files, with the below exception. > There is a previous YARN-3087 that added a fix to the same issue before. > There is a need to investigate whether it is a testing issue or it the error > has resurfaced. > {code:bash} > org.apache.hadoop.yarn.webapp.WebAppException: > /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: > controller for v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at > com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) > at >
[jira] [Commented] (YARN-10556) Web-app server does not work for V2 timeline
[ https://issues.apache.org/jira/browse/YARN-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256623#comment-17256623 ] Ahmed Hussein commented on YARN-10556: -- [~gtcarrera9], [~sjlee0], [~sjlee], [~junping_du] You guys are familiar with this error since you contributed to YARN-3087, Can you please give a quick look into the above errors? > Web-app server does not work for V2 timeline > > > Key: YARN-10556 > URL: https://issues.apache.org/jira/browse/YARN-10556 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Ahmed Hussein >Priority: Major > > {{TestDistributedShell}} for timeline version 2.0 shows the following errors > in the log files, with the below exception. > There is a previous YARN-3087 that added a fix to the same issue before. > There is a need to investigate whether it is a testing issue or it the error > has resurfaced. > {code:bash} > org.apache.hadoop.yarn.webapp.WebAppException: > /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: > controller for v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) > at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) > at > com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at >
[jira] [Created] (YARN-10556) Web-app server does not work for V2 timeline
Ahmed Hussein created YARN-10556: Summary: Web-app server does not work for V2 timeline Key: YARN-10556 URL: https://issues.apache.org/jira/browse/YARN-10556 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Ahmed Hussein {{TestDistributedShell}} for timeline version 2.0 shows the following errors in the log files, with the below exception. There is a previous YARN-3087 that added a fix to the same issue before. There is a need to investigate whether it is a testing issue or it the error has resurfaced. {code:bash} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at
[jira] [Assigned] (YARN-10553) Refactor TestDistributedShell
[ https://issues.apache.org/jira/browse/YARN-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reassigned YARN-10553: Assignee: Ahmed Hussein > Refactor TestDistributedShell > - > > Key: YARN-10553 > URL: https://issues.apache.org/jira/browse/YARN-10553 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: refactoring, test > > TestDistributedShell has grown so large over time. It has 29 tests. > This is ru inning the risk of exceeding 30 minutes limit for a single unit > class. > * The implementation has lots of code redundancy. > * It is inefficient in the setup and tearing down. The large percentage of > time execution is exhausted by starting cluster and stopping the services. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10553) Refactor TestDistributedShell
Ahmed Hussein created YARN-10553: Summary: Refactor TestDistributedShell Key: YARN-10553 URL: https://issues.apache.org/jira/browse/YARN-10553 Project: Hadoop YARN Issue Type: Bug Components: distributed-shell, test Reporter: Ahmed Hussein TestDistributedShell has grown so large over time. It has 29 tests. This is ru inning the risk of exceeding 30 minutes limit for a single unit class. * The implementation has lots of code redundancy. * It is inefficient in the setup and tearing down. The large percentage of time execution is exhausted by starting cluster and stopping the services. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10040: - Priority: Major (was: Blocker) > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253737#comment-17253737 ] Ahmed Hussein edited comment on YARN-10040 at 12/22/20, 8:48 PM: - [~abmodi] can you suggest anyone familiar with the changes done in YARN-9697? was (Author: ahussein): I changed the status of this Jira to blocker. [~abmodi] can you suggest anyone familiar with the changes done in YARN-9697? > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253737#comment-17253737 ] Ahmed Hussein commented on YARN-10040: -- I changed the status of this Jira to blocker. [~abmodi] can you suggest anyone familiar with the changes done in YARN-9697? > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Blocker > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10040: - Priority: Blocker (was: Major) > DistributedShell test failure on X86 and ARM > > > Key: YARN-10040 > URL: https://issues.apache.org/jira/browse/YARN-10040 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell > Environment: X86/ARM > OS: ubuntu1804 > Java 8 >Reporter: zhao bo >Assignee: Abhishek Modi >Priority: Blocker > Attachments: YARN-10040.001.patch > > > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers > * > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType > Please see the Apache Jenkins Test result: > [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/] > > These 2 tests are failed on both X86 and ARM platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10334) TestDistributedShell leaks resources on timeout/failure
[ https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251298#comment-17251298 ] Ahmed Hussein commented on YARN-10334: -- Those are the steps going to fix the problem * YARN-10536 is going to make the thread responsive in. handling exceptions. * Pass {{timeout}} argument to the {{DistributedShell.Client}}. This timeout has to be smaller than the {{TestDistributedShell.timeout}} rule. * Optional: Client and YarnClient have no interfaces to shutdown/close. Adding such methods to be accessed by the unit tests will be a good addition in order to clean out the code. > TestDistributedShell leaks resources on timeout/failure > --- > > Key: YARN-10334 > URL: https://issues.apache.org/jira/browse/YARN-10334 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test, yarn >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: newbie, test > > {{TestDistributedShell}} times out on trunk. I found that the application, > and containers will stay running in the background long after the unit test > has failed. > This causes failure of other test cases and several false positives failures > as result of: > * Ports will stay busy, so other tests cases fail to launch. > * Unit tests fail because of memory restrictions. > Although the unit test is already broken on trunk, we do not want its > failures to other unit tests. > {{TestDistributedShell}} needs to be revisited to make sure that all > {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of > the each unit test (including exception and timeouts) > Steps to reproduce: > {code:bash} > mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers > ## this will timeout as > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 90.234 s <<< FAILURE! - in > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > [ERROR] > testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 90.018 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 9 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » > TestTimedOut > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > {code} > Using {{ps}} command, you can find the yarn processes are still in the > background > {code:bash} > /bin/bash -c $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein >
[jira] [Commented] (YARN-10499) TestRouterWebServicesREST fails
[ https://issues.apache.org/jira/browse/YARN-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251292#comment-17251292 ] Ahmed Hussein commented on YARN-10499: -- [~aajisaka] .. You are the man :) It feels great to see the failing list down to: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/358/#showFailuresLink {code:bash} Test Result (6 failures / -202) org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testReadLockCanBeDisabledByConfig org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.testAMSimulatorWithNodeLabels[1] org.apache.hadoop.tools.dynamometer.TestDynamometerInfra.org.apache.hadoop.tools.dynamometer.TestDynamometerInfra org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType {code} > TestRouterWebServicesREST fails > --- > > Key: YARN-10499 > URL: https://issues.apache.org/jira/browse/YARN-10499 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: > patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt > > Time Spent: 1h > Remaining Estimate: 0h > > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2488/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt] > {noformat} > [ERROR] Failures: > [ERROR] > TestRouterWebServicesREST.testAppAttemptXML:720->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppPriorityXML:796->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppQueueXML:846->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppStateXML:744->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppTimeoutXML:920->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppTimeoutsXML:896->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppXML:696->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testUpdateAppPriorityXML:832 > expected:<200> but was:<500> > [ERROR] TestRouterWebServicesREST.testUpdateAppQueueXML:882 expected:<200> > but was:<500> > [ERROR] TestRouterWebServicesREST.testUpdateAppStateXML:782 expected:<202> > but was:<500> > [ERROR] Errors: > [ERROR] > TestRouterWebServicesREST.testGetAppAttemptXML:1292->getAppAttempt:1464 » > ClientHandler > [ERROR] > TestRouterWebServicesREST.testGetAppsMultiThread:1337->testGetContainersXML:1317->getAppAttempt:1464 > » ClientHandler > [ERROR] > TestRouterWebServicesREST.testGetContainersXML:1317->getAppAttempt:1464 » > ClientHandler {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251173#comment-17251173 ] Ahmed Hussein commented on YARN-10536: -- [~ayushsaxena], [~inigoiri], [~epayne] Can you please take a look at that small change? After it is gets merged I will work on YARN-10536 to reduce the overhead of running those tests. > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250778#comment-17250778 ] Ahmed Hussein commented on YARN-10536: -- The current implementation checks the timeout with reference to {{Client.clientStartTime}}. The latter is the timestamp of the object creation as shown in that [line of code|https://github.com/apache/hadoop/blob/df7f1e5199eed917ff40618708e7641238684d24/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java#L212]. The timeout should be measured when the client gets started (by calling {{run()}}) like in that [line of code|https://github.com/apache/hadoop/blob/df7f1e5199eed917ff40618708e7641238684d24/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java#L671]. I do not think there is a point starting countdown on object creation? > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10536) Client in distributedShell swallows interrupt exceptions
Ahmed Hussein created YARN-10536: Summary: Client in distributedShell swallows interrupt exceptions Key: YARN-10536 URL: https://issues.apache.org/jira/browse/YARN-10536 Project: Hadoop YARN Issue Type: Bug Components: client, distributed-shell Reporter: Ahmed Hussein Assignee: Ahmed Hussein In {{applications.distributedshell.Client}} , the method {{monitorApplication}} loops waiting for the following conditions: * Application fails: reaches {{YarnApplicationState.KILLED}}, or {{YarnApplicationState.FAILED}} * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or {{YarnApplicationState.FINISHED}} * the time spent waiting is longer than {{clientTimeout}} (if it exists in the parameters). When the Client thread is interrupted, it ignores the exception: {code:java} // Check app status every 1 second. try { Thread.sleep(1000); } catch (InterruptedException e) { LOG.debug("Thread sleep in monitoring loop interrupted"); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247564#comment-17247564 ] Ahmed Hussein edited comment on YARN-10040 at 12/11/20, 5:28 AM: - {quote}Abhishek Modi any pointers about this? Is the code only broken or just the test. If the functionality itself has some issue we should consider reverting YARN-9697, else if this is only a test issue, we should wrap this up, if there isn't a fix available we can disable this test for time being. Let me know what is the actual situation. I can try help in whichever way possible.{quote} [~abmodi] Would you mind please taking a look at the failures? was (Author: ahussein): On iOS The {{TestDistributedShell}} does not run. But I thought to dump the error here because a NPE could be a hint to what's broken in the implementation. {code:bash} 2020-12-10 17:29:22,129 INFO [IPC Server listener on 8048] ipc.Server (Server.java:run(1344)) - IPC Server listener on 8048: starting 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] collectormanager.NMCollectorService (NMCollectorService.java:serviceStart(101)) - NMCollectorService started at localhost/127.0.0.1:8048 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:serviceStart(267)) - Node ID assigned is : localhost:54943 2020-12-10 17:29:22,207 INFO [Listener at localhost/8048] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(617)) - NodeManager from node localhost(cmPort: 54943 httpPort: 54946) registered with capability: , assigned nodeId localhost:54943 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMContainerTokenSecretManager (NMContainerTokenSecretManager.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -210390460 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMTokenSecretManagerInNM (NMTokenSecretManagerInNM.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -1432443197 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(486)) - Registered with ResourceManager as localhost:54943 with total resource of 2020-12-10 17:29:22,212 INFO [Listener at localhost/8048] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(701)) - Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [RM Event dispatcher] rmnode.RMNodeImpl (RMNodeImpl.java:handle(774)) - localhost:54943 Node Transitioned from NEW to UNHEALTHY 2020-12-10 17:29:22,214 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] distributed.NodeQueueLoadMonitor (NodeQueueLoadMonitor.java:removeNode(202)) - Node delete event for: localhost 2020-12-10 17:29:22,215 ERROR [SchedulerEventDispatcher:Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(2127)) - Attempting to remove non-existent node localhost:54943 2020-12-10 17:29:22,215 ERROR [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type NODE_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeFromNodeIdsByRack(NodeQueueLoadMonitor.java:405) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeNode(NodeQueueLoadMonitor.java:204) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:399) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:94) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:71) at java.lang.Thread.run(Thread.java:748) 2020-12-10 17:29:22,216 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor]
[jira] [Assigned] (YARN-10334) TestDistributedShell leaks resources on timeout/failure
[ https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reassigned YARN-10334: Assignee: Ahmed Hussein > TestDistributedShell leaks resources on timeout/failure > --- > > Key: YARN-10334 > URL: https://issues.apache.org/jira/browse/YARN-10334 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test, yarn >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: newbie, test > > {{TestDistributedShell}} times out on trunk. I found that the application, > and containers will stay running in the background long after the unit test > has failed. > This causes failure of other test cases and several false positives failures > as result of: > * Ports will stay busy, so other tests cases fail to launch. > * Unit tests fail because of memory restrictions. > Although the unit test is already broken on trunk, we do not want its > failures to other unit tests. > {{TestDistributedShell}} needs to be revisited to make sure that all > {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of > the each unit test (including exception and timeouts) > Steps to reproduce: > {code:bash} > mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers > ## this will timeout as > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 90.234 s <<< FAILURE! - in > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > [ERROR] > testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 90.018 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 9 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » > TestTimedOut > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > {code} > Using {{ps}} command, you can find the yarn processes are still in the > background > {code:bash} > /bin/bash -c $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein > 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stdout > > 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stderr > $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128
[jira] [Commented] (YARN-10040) DistributedShell test failure on X86 and ARM
[ https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247564#comment-17247564 ] Ahmed Hussein commented on YARN-10040: -- On iOS The {{TestDistributedShell}} does not run. But I thought to dump the error here because a NPE could be a hint to what's broken in the implementation. {code:bash} 2020-12-10 17:29:22,129 INFO [IPC Server listener on 8048] ipc.Server (Server.java:run(1344)) - IPC Server listener on 8048: starting 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] collectormanager.NMCollectorService (NMCollectorService.java:serviceStart(101)) - NMCollectorService started at localhost/127.0.0.1:8048 2020-12-10 17:29:22,131 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:serviceStart(267)) - Node ID assigned is : localhost:54943 2020-12-10 17:29:22,207 INFO [Listener at localhost/8048] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(617)) - NodeManager from node localhost(cmPort: 54943 httpPort: 54946) registered with capability: , assigned nodeId localhost:54943 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMContainerTokenSecretManager (NMContainerTokenSecretManager.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -210390460 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] security.NMTokenSecretManagerInNM (NMTokenSecretManagerInNM.java:setMasterKey(143)) - Rolling master-key for container-tokens, got key with id -1432443197 2020-12-10 17:29:22,210 INFO [Listener at localhost/8048] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(486)) - Registered with ResourceManager as localhost:54943 with total resource of 2020-12-10 17:29:22,212 INFO [Listener at localhost/8048] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(701)) - Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2020-12-10 17:29:22,212 INFO [Thread[Thread-282,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating the current master key for generating delegation tokens 2020-12-10 17:29:22,212 INFO [RM Event dispatcher] rmnode.RMNodeImpl (RMNodeImpl.java:handle(774)) - localhost:54943 Node Transitioned from NEW to UNHEALTHY 2020-12-10 17:29:22,214 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] distributed.NodeQueueLoadMonitor (NodeQueueLoadMonitor.java:removeNode(202)) - Node delete event for: localhost 2020-12-10 17:29:22,215 ERROR [SchedulerEventDispatcher:Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(2127)) - Attempting to remove non-existent node localhost:54943 2020-12-10 17:29:22,215 ERROR [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type NODE_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeFromNodeIdsByRack(NodeQueueLoadMonitor.java:405) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeNode(NodeQueueLoadMonitor.java:204) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:399) at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:94) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:71) at java.lang.Thread.run(Thread.java:748) 2020-12-10 17:29:22,216 INFO [org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event Processor] event.EventDispatcher (EventDispatcher.java:run(84)) - Exiting, bbye.. 2020-12-10 17:29:22,217 INFO [Listener at localhost/8048] ipc.CallQueueManager (CallQueueManager.java:(93)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. 2020-12-10 17:29:22,218 INFO [Socket Reader #1 for port 0] ipc.Server (Server.java:run(1265)) - Starting Socket Reader #1 for port 0 2020-12-10 17:29:22,222 INFO [Listener at localhost/54947]
[jira] [Comment Edited] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)
[ https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242719#comment-17242719 ] Ahmed Hussein edited comment on YARN-10494 at 12/2/20, 8:54 PM: Thanks [~ccondit] for the update. I suggest to create a branch and WIP PR to make peer-reviews easier. was (Author: ahussein): Thanks [~ccondit] for the update. > CLI tool for docker-to-squashfs conversion (pure Java) > -- > > Key: YARN-10494 > URL: https://issues.apache.org/jira/browse/YARN-10494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Attachments: YARN-10494.001.patch, > docker-to-squashfs-conversion-tool-design.pdf > > > *YARN-9564* defines a docker-to-squashfs image conversion tool that relies on > python2, multiple libraries, squashfs-tools and root access in order to > convert Docker images to squashfs images for use with the runc container > runtime in YARN. > *YARN-9943* was created to investigate alternatives, as the response to > merging YARN-9564 has not been very positive. This proposal outlines the > design for a CLI conversion tool in 100% pure Java that will work out of the > box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)
[ https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242719#comment-17242719 ] Ahmed Hussein commented on YARN-10494: -- Thanks [~ccondit] for the update. > CLI tool for docker-to-squashfs conversion (pure Java) > -- > > Key: YARN-10494 > URL: https://issues.apache.org/jira/browse/YARN-10494 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Attachments: YARN-10494.001.patch, > docker-to-squashfs-conversion-tool-design.pdf > > > *YARN-9564* defines a docker-to-squashfs image conversion tool that relies on > python2, multiple libraries, squashfs-tools and root access in order to > convert Docker images to squashfs images for use with the runc container > runtime in YARN. > *YARN-9943* was created to investigate alternatives, as the response to > merging YARN-9564 has not been very positive. This proposal outlines the > design for a CLI conversion tool in 100% pure Java that will work out of the > box. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10468) TestNodeStatusUpdater does not handle early failure in threads
[ https://issues.apache.org/jira/browse/YARN-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reassigned YARN-10468: Assignee: Ahmed Hussein > TestNodeStatusUpdater does not handle early failure in threads > -- > > Key: YARN-10468 > URL: https://issues.apache.org/jira/browse/YARN-10468 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > > While investigating HADOOP-17314, I found that the > * TestNodeStatusUpdater#testNMRegistration() will continue running {{while > (heartBeatID <= 3 && waitCount++ != 200) {}} even though the nm thread could > already be dead. the unit should detect that the nm has died and terminates > sooner to release resources for other tests. > * TestNodeStatusUpdater#testNMRMConnectionConf(). Same problem as described > above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10485) TimelineConnector swallows InterruptedException
Ahmed Hussein created YARN-10485: Summary: TimelineConnector swallows InterruptedException Key: YARN-10485 URL: https://issues.apache.org/jira/browse/YARN-10485 Project: Hadoop YARN Issue Type: Bug Reporter: Ahmed Hussein Assignee: Ahmed Hussein Some tests timeout or take excessively long to shutdown because the {{TimelineConnector}} will catch InterruptedException and go into a retry loop instead of aborting. [~daryn] reported that this makes debugging more difficult and he suggests the exception to be thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10483) yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复
[ https://issues.apache.org/jira/browse/YARN-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226909#comment-17226909 ] Ahmed Hussein commented on YARN-10483: -- Thanks [~weichiu] :) This is very helpful information. > yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复 > -- > > Key: YARN-10483 > URL: https://issues.apache.org/jira/browse/YARN-10483 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler, resourcemanager, > RM >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > Attachments: RM_normal_state.stack, RM_unnormal_state.stack > > > yarn不定期卡死,新任务无法提交,经排查jstack日志,capacity > scheduler有线程在无限等待锁,rm的cpu内存网络磁盘均正常。问题基本可以确定是capacity > scheduler内部的锁出了问题。正常状态下和卡住状态下rm的jstack日志已上传,希望有人可以解决一下,此bug比较严重,直接导致生产不可用。没人解答待会我再来问 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10483) yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复
[ https://issues.apache.org/jira/browse/YARN-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226840#comment-17226840 ] Ahmed Hussein edited comment on YARN-10483 at 11/5/20, 5:13 PM: [~Jufeng] Can you please change the title and description of this Jira to English. I do not think it is a good idea to have multiple languages Jiras because it complicates searching for everyone. Thank You. was (Author: ahussein): [~Jufeng] Can you please change the title and description of this Jira to English. I do not think it is a good idea to have multiple languages Jiras because we it complicates searching for everyone. Thank You. > yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复 > -- > > Key: YARN-10483 > URL: https://issues.apache.org/jira/browse/YARN-10483 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler, resourcemanager, > RM >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > Attachments: RM_normal_state.stack, RM_unnormal_state.stack > > > yarn不定期卡死,新任务无法提交,经排查jstack日志,capacity > scheduler有线程在无限等待锁,rm的cpu内存网络磁盘均正常。问题基本可以确定是capacity > scheduler内部的锁出了问题。正常状态下和卡住状态下rm的jstack日志已上传,希望有人可以解决一下,此bug比较严重,直接导致生产不可用。没人解答待会我再来问 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10483) yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复
[ https://issues.apache.org/jira/browse/YARN-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein resolved YARN-10483. -- Release Note: Please create Jiras that makes it easy for other developers to search and understand. Resolution: Information Provided > yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复 > -- > > Key: YARN-10483 > URL: https://issues.apache.org/jira/browse/YARN-10483 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler, resourcemanager, > RM >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > Attachments: RM_normal_state.stack, RM_unnormal_state.stack > > > yarn不定期卡死,新任务无法提交,经排查jstack日志,capacity > scheduler有线程在无限等待锁,rm的cpu内存网络磁盘均正常。问题基本可以确定是capacity > scheduler内部的锁出了问题。正常状态下和卡住状态下rm的jstack日志已上传,希望有人可以解决一下,此bug比较严重,直接导致生产不可用。没人解答待会我再来问 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10483) yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复
[ https://issues.apache.org/jira/browse/YARN-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226840#comment-17226840 ] Ahmed Hussein commented on YARN-10483: -- [~Jufeng] Can you please change the title and description of this Jira to English. I do not think it is a good idea to have multiple languages Jiras because we it complicates searching for everyone. Thank You. > yarn hang住卡死,任务无法提交,切换RM主节点或重启才能恢复 > -- > > Key: YARN-10483 > URL: https://issues.apache.org/jira/browse/YARN-10483 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler, resourcemanager, > RM >Affects Versions: 3.1.1 >Reporter: jufeng li >Priority: Blocker > Attachments: RM_normal_state.stack, RM_unnormal_state.stack > > > yarn不定期卡死,新任务无法提交,经排查jstack日志,capacity > scheduler有线程在无限等待锁,rm的cpu内存网络磁盘均正常。问题基本可以确定是capacity > scheduler内部的锁出了问题。正常状态下和卡住状态下rm的jstack日志已上传,希望有人可以解决一下,此bug比较严重,直接导致生产不可用。没人解答待会我再来问 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10468) TestNodeStatusUpdater does not handle early failure in threads
Ahmed Hussein created YARN-10468: Summary: TestNodeStatusUpdater does not handle early failure in threads Key: YARN-10468 URL: https://issues.apache.org/jira/browse/YARN-10468 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Ahmed Hussein While investigating HADOOP-17314, I found that the * TestNodeStatusUpdater#testNMRegistration() will continue running {{while (heartBeatID <= 3 && waitCount++ != 200) {}} even though the nm thread could already be dead. the unit should detect that the nm has died and terminates sooner to release resources for other tests. * TestNodeStatusUpdater#testNMRMConnectionConf(). Same problem as described above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211436#comment-17211436 ] Ahmed Hussein commented on YARN-10455: -- Thanks [~ebadger]. > TestNMProxy.testNMProxyRPCRetry is not consistent > - > > Key: YARN-10455 > URL: https://issues.apache.org/jira/browse/YARN-10455 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.1.2, 3.2.2, 3.4.0, 3.3.1, 2.10.2 > > Attachments: YARN-10455-branch-2.10.001.patch, YARN-10455.001.patch > > > The fix in YARN-8844 may fail depending on the configuration of the machine > running the test. > In some cases the address gets resolved and the Unit throws a connection > timeout exception instead. In such scenario the JUnit times out the main > reason behind the failure is swallowed by the shutdown of the clients. > To make sure that the JUnit behavior is consistent, a suggested fix is to > set the host address to {{127.0.0.1:1}}. The latter will omit the probability > of collisions on non-privileged ports. > Also, it is more correct to catch {{SocketException}} directly rather than > catching IOException with a check for not {{SocketException}}. > > The stack trace with such failures: > {code:bash} > [INFO] Running > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 24.293 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] > testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy) > Time elapsed: 20.18 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 2 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645) > at org.apache.hadoop.ipc.Client.call(Client.java:1461) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy24.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy25.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) >
[jira] [Commented] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211063#comment-17211063 ] Ahmed Hussein commented on YARN-10455: -- Thank you [~Jim_Brennan]! I uploaded a patch for branch-2.10. > TestNMProxy.testNMProxyRPCRetry is not consistent > - > > Key: YARN-10455 > URL: https://issues.apache.org/jira/browse/YARN-10455 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.1.2, 3.2.2, 3.4.0, 3.3.1 > > Attachments: YARN-10455-branch-2.10.001.patch, YARN-10455.001.patch > > > The fix in YARN-8844 may fail depending on the configuration of the machine > running the test. > In some cases the address gets resolved and the Unit throws a connection > timeout exception instead. In such scenario the JUnit times out the main > reason behind the failure is swallowed by the shutdown of the clients. > To make sure that the JUnit behavior is consistent, a suggested fix is to > set the host address to {{127.0.0.1:1}}. The latter will omit the probability > of collisions on non-privileged ports. > Also, it is more correct to catch {{SocketException}} directly rather than > catching IOException with a check for not {{SocketException}}. > > The stack trace with such failures: > {code:bash} > [INFO] Running > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 24.293 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] > testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy) > Time elapsed: 20.18 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 2 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645) > at org.apache.hadoop.ipc.Client.call(Client.java:1461) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy24.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy25.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at >
[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10455: - Attachment: YARN-10455-branch-2.10.001.patch > TestNMProxy.testNMProxyRPCRetry is not consistent > - > > Key: YARN-10455 > URL: https://issues.apache.org/jira/browse/YARN-10455 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.1.2, 3.2.2, 3.4.0, 3.3.1 > > Attachments: YARN-10455-branch-2.10.001.patch, YARN-10455.001.patch > > > The fix in YARN-8844 may fail depending on the configuration of the machine > running the test. > In some cases the address gets resolved and the Unit throws a connection > timeout exception instead. In such scenario the JUnit times out the main > reason behind the failure is swallowed by the shutdown of the clients. > To make sure that the JUnit behavior is consistent, a suggested fix is to > set the host address to {{127.0.0.1:1}}. The latter will omit the probability > of collisions on non-privileged ports. > Also, it is more correct to catch {{SocketException}} directly rather than > catching IOException with a check for not {{SocketException}}. > > The stack trace with such failures: > {code:bash} > [INFO] Running > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 24.293 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] > testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy) > Time elapsed: 20.18 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 2 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645) > at org.apache.hadoop.ipc.Client.call(Client.java:1461) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy24.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy25.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Commented] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210272#comment-17210272 ] Ahmed Hussein commented on YARN-10455: -- [~leftnoteasy], [~eyang], [~Jim_Brennan] Can you please take at the patch? > TestNMProxy.testNMProxyRPCRetry is not consistent > - > > Key: YARN-10455 > URL: https://issues.apache.org/jira/browse/YARN-10455 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: YARN-10455.001.patch > > > The fix in YARN-8844 may fail depending on the configuration of the machine > running the test. > In some cases the address gets resolved and the Unit throws a connection > timeout exception instead. In such scenario the JUnit times out the main > reason behind the failure is swallowed by the shutdown of the clients. > To make sure that the JUnit behavior is consistent, a suggested fix is to > set the host address to {{127.0.0.1:1}}. The latter will omit the probability > of collisions on non-privileged ports. > Also, it is more correct to catch {{SocketException}} directly rather than > catching IOException with a check for not {{SocketException}}. > > The stack trace with such failures: > {code:bash} > [INFO] Running > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 24.293 s <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy > [ERROR] > testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy) > Time elapsed: 20.18 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 2 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645) > at org.apache.hadoop.ipc.Client.call(Client.java:1461) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy24.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy25.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at >
[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10455: - Description: The fix in YARN-8844 may fail depending on the configuration of the machine running the test. In some cases the address gets resolved and the Unit throws a connection timeout exception instead. In such scenario the JUnit times out the main reason behind the failure is swallowed by the shutdown of the clients. To make sure that the JUnit behavior is consistent, a suggested fix is to set the host address to {{127.0.0.1:1}}. The latter will omit the probability of collisions on non-privileged ports. Also, it is more correct to catch {{SocketException}} directly rather than catching IOException with a check for not {{SocketException}}. The stack trace with such failures: {code:bash} [INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.293 s <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy [ERROR] testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy) Time elapsed: 20.18 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 2 milliseconds at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645) at org.apache.hadoop.ipc.Client.call(Client.java:1461) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) at com.sun.proxy.$Proxy24.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy25.startContainers(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) [INFO] [INFO]
[jira] [Updated] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
[ https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-10455: - Description: The fix in YARN-8844 may fail depending on the configuration of the machine running the test. In some cases the address gets resolved and the Unit throws a connection timeout exception instead. In such scenario the JUnit times out the main reason behind the failure is swallowed by the shutdown of the clients. To make sure that the JUnit behavior is consistent, a suggested fix is to set the host address to {{127.0.0.1:1}}. The latter will omit the probability of collisions on non-privileged ports. Also, it is more correct to catch {{SocketException}} directly rather than catching IOException with a check for not {{SocketException}}. was: The fix in YARN-8844 may fail depending on the configuration of the machine running the test. In some cases the address gets resolved and the Unit throws a connection timeout exception instead. In such scenario the JUnit times out the main reason behind the failure is swallowed by the shutdown of the clients. To make sure that the JUnit behavior is consistent, a suggested fix is to set the host address to {{127.0.0.1:1}}. The latter will omit the probability of collisions on non-privileged ports. > TestNMProxy.testNMProxyRPCRetry is not consistent > - > > Key: YARN-10455 > URL: https://issues.apache.org/jira/browse/YARN-10455 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > > The fix in YARN-8844 may fail depending on the configuration of the machine > running the test. > In some cases the address gets resolved and the Unit throws a connection > timeout exception instead. In such scenario the JUnit times out the main > reason behind the failure is swallowed by the shutdown of the clients. > To make sure that the JUnit behavior is consistent, a suggested fix is to set > the host address to {{127.0.0.1:1}}. The latter will omit the probability of > collisions on non-privileged ports. > Also, it is more correct to catch {{SocketException}} directly rather than > catching IOException with a check for not {{SocketException}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10455) TestNMProxy.testNMProxyRPCRetry is not consistent
Ahmed Hussein created YARN-10455: Summary: TestNMProxy.testNMProxyRPCRetry is not consistent Key: YARN-10455 URL: https://issues.apache.org/jira/browse/YARN-10455 Project: Hadoop YARN Issue Type: Bug Reporter: Ahmed Hussein Assignee: Ahmed Hussein The fix in YARN-8844 may fail depending on the configuration of the machine running the test. In some cases the address gets resolved and the Unit throws a connection timeout exception instead. In such scenario the JUnit times out the main reason behind the failure is swallowed by the shutdown of the clients. To make sure that the JUnit behavior is consistent, a suggested fix is to set the host address to {{127.0.0.1:1}}. The latter will omit the probability of collisions on non-privileged ports. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10337) TestRMHATimelineCollectors fails on hadoop trunk
Ahmed Hussein created YARN-10337: Summary: TestRMHATimelineCollectors fails on hadoop trunk Key: YARN-10337 URL: https://issues.apache.org/jira/browse/YARN-10337 Project: Hadoop YARN Issue Type: Bug Components: test, yarn Reporter: Ahmed Hussein {{TestRMHATimelineCollectors}} has been failing on trunk. I see it frequently in the qbt reports and the yetus reprts {code:bash} [INFO] Running org.apache.hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.95 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors [ERROR] testRebuildCollectorDataOnFailover(org.apache.hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors) Time elapsed: 5.615 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors.testRebuildCollectorDataOnFailover(TestRMHATimelineCollectors.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:80) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TestRMHATimelineCollectors.testRebuildCollectorDataOnFailover:105 NullPointer [INFO] [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 [INFO] [ERROR] There are test failures. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10334) TestDistributedShell leaks resources on timeout/failure
Ahmed Hussein created YARN-10334: Summary: TestDistributedShell leaks resources on timeout/failure Key: YARN-10334 URL: https://issues.apache.org/jira/browse/YARN-10334 Project: Hadoop YARN Issue Type: Bug Components: distributed-shell, test, yarn Reporter: Ahmed Hussein {{TestDistributedShell}} times out on trunk. I found that the application, and containers will stay running in the background long after the unit test has failed. This causes failure of other test cases and several false positives failures as result of: * Ports will stay busy, so other tests cases fail to launch. * Unit tests fail because of memory restrictions. Although the unit test is already broken on trunk, we do not want its failures to other unit tests. {{TestDistributedShell}} needs to be revisited to make sure that all {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of the each unit test (including exception and timeouts) Steps to reproduce: {code:bash} mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers ## this will timeout as [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 90.234 s <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell [ERROR] testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 90.018 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 9 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » TestTimedOut [INFO] [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} Using {{ps}} command, you can find the yarn processes are still in the background {code:bash} /bin/bash -c $JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stdout 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_01/AppMaster.stderr $JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10176) TestTimelineAuthFilterForV2 fails intermittently
[ https://issues.apache.org/jira/browse/YARN-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105725#comment-17105725 ] Ahmed Hussein commented on YARN-10176: -- {code:bash} lineservice.security.TestTimelineAuthFilterForV2 [ERROR] testPutTimelineEntities[1](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2) Time elapsed: 6.611 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:293) at org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:437) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) [INFO] [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestTimelineAuthFilterForV2.testPutTimelineEntities:437->verifyEntity:293 [INFO] [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0 [INFO] [ERROR] There are test failures. {code} > TestTimelineAuthFilterForV2 fails intermittently > > > Key: YARN-10176 > URL: https://issues.apache.org/jira/browse/YARN-10176 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Reporter: Ahmed Hussein >Assignee: Prabhu Joseph >Priority: Major > > TestTimelineAuthFilterForV2 fails intermittently on trunk and branch-2.10. > To reproduce the failure, execute TestTimelineAuthFilterForV2 inside a loop. > {code:bash} > [INFO] Running > org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 > [ERROR] Tests
[jira] [Resolved] (YARN-10220) RM HA times out intermittently
[ https://issues.apache.org/jira/browse/YARN-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein resolved YARN-10220. -- Resolution: Cannot Reproduce I will close it for now since I cannot reproduce the failures as reported in YARN-2710 > RM HA times out intermittently > -- > > Key: YARN-10220 > URL: https://issues.apache.org/jira/browse/YARN-10220 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3 >Reporter: Ahmed Hussein >Assignee: Bilwa S T >Priority: Major > > TestResourceTrackerOnHA Among other tests time out intermittently > * TestApplicationClientProtocolOnHA > * TestApplicationMasterServiceProtocolForTimelineV2 > * TestApplicationMasterServiceProtocolOnHA > {code:bash} > [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ > hadoop-yarn-client --- > [INFO] > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 19.612 s <<< FAILURE! - in > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA > [ERROR] > testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA) > Time elapsed: 19.473 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 15000 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:699) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:812) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy93.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy94.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at >
[jira] [Commented] (YARN-10220) RM HA times out intermittently
[ https://issues.apache.org/jira/browse/YARN-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105499#comment-17105499 ] Ahmed Hussein commented on YARN-10220: -- [~BilwaST], I could not reproduce it again for 3.x or 2.10. I think this is good news then! I will close it. > RM HA times out intermittently > -- > > Key: YARN-10220 > URL: https://issues.apache.org/jira/browse/YARN-10220 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3 >Reporter: Ahmed Hussein >Assignee: Bilwa S T >Priority: Major > > TestResourceTrackerOnHA Among other tests time out intermittently > * TestApplicationClientProtocolOnHA > * TestApplicationMasterServiceProtocolForTimelineV2 > * TestApplicationMasterServiceProtocolOnHA > {code:bash} > [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ > hadoop-yarn-client --- > [INFO] > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 19.612 s <<< FAILURE! - in > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA > [ERROR] > testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA) > Time elapsed: 19.473 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 15000 > milliseconds > at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) > at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) > at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:699) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:812) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy93.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy94.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at >
[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100858#comment-17100858 ] Ahmed Hussein commented on YARN-8959: - {quote}Could waitForThreadToWait() be switched to await(). I think it will be easiest to understand as there is already a large precedent and readers of the code will be familiar. It will also be less specialized code to maintain in the code base.{quote} Thank [~jeagles] for the feedback. You are right, {{await()}} seems to be easier and more stable alternative for {{waitForThreadToWait}}. I have uploaded new patches with the changes. For 3.x, the UT seems to be stable without intermittent failures. For 2.10, I see that the UT became more stable, but eventually it fails with a new error. I will create a new Jira to address that failure since it is different from the original failures that triggered this very Jira. {code:bash} [INFO] Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing [ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.597 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing [ERROR] testIncreaseContainerUnreservedWhenApplicationCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) Time elapsed: 0.265 s <<< FAILURE! java.lang.AssertionError: expected null, but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotNull(Assert.java:664) at org.junit.Assert.assertNull(Assert.java:646) at org.junit.Assert.assertNull(Assert.java:656) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) [INFO] [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted:826 expected null, but was: [INFO] [ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 0 {code} > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959-branch-2.10.004.patch, > YARN-8959.001.patch, YARN-8959.002.patch, YARN-8959.003.patch > > >
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959.003.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959-branch-2.10.004.patch, > YARN-8959.001.patch, YARN-8959.002.patch, YARN-8959.003.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959-branch-2.10.004.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959-branch-2.10.004.patch, > YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099351#comment-17099351 ] Ahmed Hussein commented on YARN-8959: - The unit test had a race condition in testSimpleDecreaseContainer. I could not get failure for other test cases. * replace "assert" by GenericTestUtil.waitFor() > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10256) Refactor TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
[ https://issues.apache.org/jira/browse/YARN-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099227#comment-17099227 ] Ahmed Hussein commented on YARN-10256: -- Thanks [~jeagles]! > Refactor > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > --- > > Key: YARN-10256 > URL: https://issues.apache.org/jira/browse/YARN-10256 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: refactoring, unit-test > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5 > > Attachments: YARN-10256.001.patch > > > In 3.x, > {{TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic}} > has redundant assertions. Since the UT throws timeout exception, > {{GenericTestsUtils.waitFor()}} guarantees that the predicate is met > successfully. Otherwise, the UT would throw a timeout exception. > The redundant loop causes confusion in understanding the test unit and may > increase the possibility of failure in case the container terminates -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959-branch-2.10.003.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, > YARN-8959-branch-2.10.003.patch, YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959-branch-2.10.002.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.002.patch, YARN-8959.001.patch, > YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959.002.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: (was: YARN-8959-branch-2.10.001.patch) > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: (was: YARN-8959-branch-2.10.002.patch) > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: (was: YARN-8959.002.patch) > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959.002.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, > YARN-8959-branch-2.10.002.patch, YARN-8959.001.patch, YARN-8959.002.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959-branch-2.10.002.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, > YARN-8959-branch-2.10.002.patch, YARN-8959.001.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: (was: YARN-8959-branch-2.10.005.patch) > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, YARN-8959.001.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: (was: YARN-8959-branch-2.10.006.patch) > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, YARN-8959.001.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959-branch-2.10.006.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, > YARN-8959-branch-2.10.005.patch, YARN-8959-branch-2.10.006.patch, > YARN-8959.001.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8959) TestContainerResizing fails randomly
[ https://issues.apache.org/jira/browse/YARN-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated YARN-8959: Attachment: YARN-8959.001.patch > TestContainerResizing fails randomly > > > Key: YARN-8959 > URL: https://issues.apache.org/jira/browse/YARN-8959 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-8959-branch-2.10.001.patch, > YARN-8959-branch-2.10.005.patch, YARN-8959.001.patch > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer > {code} > testSimpleDecreaseContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.348 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<3072> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testSimpleDecreaseContainer(TestContainerResizing.java:210) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted > {code} > testIncreaseContainerUnreservedWhenContainerCompleted(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.445 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<7168> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1011) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenContainerCompleted(TestContainerResizing.java:729) > {code} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer > {code} > testExcessiveReservationWhenDecreaseSameContainer(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing) > Time elapsed: 0.321 s <<< FAILURE! > java.lang.AssertionError: expected:<1024> but was:<2048> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.checkUsedResource(TestContainerResizing.java:1015) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testExcessiveReservationWhenDecreaseSameContainer(TestContainerResizing.java:623) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org