[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Environment: RM recovery and NM recovery enabled; Spark streaming application, a long-running application on yarn was: Hadoop 2.7.1 RM recovery and NM recovery enabled; Spark streaming application, a long-running application on yarn > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because of NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; > *A bug here*: nm1 will list this container c1 in webui forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > *A bug here*: > Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because of NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: nm1 will list this container c1 in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here*: Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here*: Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because of NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; > *A bug here*: nm1 will list this container c1 in webui forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > *A bug here*: > Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here* Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here* Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; > *A bug here*: this NM will list this container in webui forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > *A bug here* > Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here*: Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here* Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; > *A bug here*: this NM will list this container in webui forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > *A bug here*: > Now, app1 has *4 containers* in total, while *c2 and c3 were the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; *A bug here*: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. *A bug here* Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; > *A bug here*: this NM will list this container in webui forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > *A bug here* > Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; A bug here: this NM will list this container in webui > forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. was: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while c2 and c3 was the same. > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; A bug here: this NM will list this container in webui > forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. > Now, app1 has *4 containers* in total, while *c2 and c3 was the same*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time; app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM, which is duplicate to c1; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; However, when nm1 registers to RM, RM found the container c1's status was DONE, so RM tells app1's AM that a container of app1 was complete, since spark streaming application has fixed number of containers, so AM request a new container named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, while c2 and c3 was the same. was: Case: A Spark streaming application named app1 running on yarn for a long time, app1 has a container named c1 on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time; > app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM, which is duplicate to c1; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; A bug here: this NM will list this container in webui > forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; However, > when nm1 registers to RM, RM found the container c1's status was DONE, so RM > tells app1's AM that a container of app1 was complete, since spark streaming > application has fixed number of containers, so AM request a new container > named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in > total, while c2 and c3 was the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7339) LocalityMulticastAMRMProxyPolicy should handle cancel request properly
[ https://issues.apache.org/jira/browse/YARN-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-7339: --- Attachment: YARN-7339-v6.patch retry as v6 patch... > LocalityMulticastAMRMProxyPolicy should handle cancel request properly > -- > > Key: YARN-7339 > URL: https://issues.apache.org/jira/browse/YARN-7339 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-7339-v1.patch, YARN-7339-v2.patch, > YARN-7339-v3.patch, YARN-7339-v4.patch, YARN-7339-v5.patch, YARN-7339-v6.patch > > > Currently inside AMRMProxy, LocalityMulticastAMRMProxyPolicy is not handling > and splitting cancel requests from AM properly: > # For node cancel request, we should not treat it as a localized resource > request. Otherwise it can lead to all weight zero issue when computing > localized resource weight. > # For ANY cancel, we should broadcast to all known subclusters, not just the > ones associated with localized resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT
[ https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213737#comment-16213737 ] Botong Huang commented on YARN-7102: Thanks [~jlowe] for the double check! When I did the cherry-pick for branch-2 it doesn't have any conflict. I think it is the auto merge that messed the annotation up. Somehow the Jenkins still didn't run for branch-2 though... > NM heartbeat stuck when responseId overflows MAX_INT > > > Key: YARN-7102 > URL: https://issues.apache.org/jira/browse/YARN-7102 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Critical > Attachments: YARN-7102-branch-2.8.v10.patch, > YARN-7102-branch-2.8.v9.patch, YARN-7102-branch-2.v9.patch, > YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v2.patch, > YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, > YARN-7102.v6.patch, YARN-7102.v7.patch, YARN-7102.v8.patch, YARN-7102.v9.patch > > > ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM > heartbeat in YARN-6640, please refer to YARN-6640 for details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time, app1 has a container named c1 on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, RM restore all the apps including app1, and all the NM need re-register to RM; was: Case: A Spark streaming application named app1 running on yarn for a long time, app1 has a container named c1 on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time, > app1 has a container named c1 on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; A bug here: this NM will list this container in webui > forever; > 4. RM restart for some reason; since RM's recovery was enabled, RM restore > all the apps including app1, and all the NM need re-register to RM; -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: A Spark streaming application named app1 running on yarn for a long time, app1 has a container named c1 on a NM named nm1; 1. The NM named nm1 was lost for some reason, but the containers on it runs well; 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM tells app1's AM that a container of app1 was failed because NM lost, so app1's AM killed that container through RPC and then request a new container named c2 from RM; 3. Administrator found nm1 lost, so he restart it; since NM's recovery was enabled, NM restore all the containers including container c1, but now c1's status is 'DONE'; A bug here: this NM will list this container in webui forever; 4. RM restart for some reason; since RM's recovery was enabled, was: Case: > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: > A Spark streaming application named app1 running on yarn for a long time, > app1 has a container named c1 on a NM named nm1; > 1. The NM named nm1 was lost for some reason, but the containers on it runs > well; > 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM > tells app1's AM that a container of app1 was failed because NM lost, so > app1's AM killed that container through RPC and then request a new container > named c2 from RM; > 3. Administrator found nm1 lost, so he restart it; since NM's recovery was > enabled, NM restore all the containers including container c1, but now c1's > status is 'DONE'; A bug here: this NM will list this container in webui > forever; > 4. RM restart for some reason; since RM's recovery was enabled, -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
[ https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-7377: -- Description: Case: > Duplicate Containers allocated for Long-Running Application after NM lost and > restart and RM restart > > > Key: YARN-7377 > URL: https://issues.apache.org/jira/browse/YARN-7377 > Project: Hadoop YARN > Issue Type: Bug > Components: applications, nodemanager, RM, yarn >Affects Versions: 3.0.0-alpha3 > Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; > Spark streaming application, a long-running application on yarn >Reporter: rangjiaheng > Labels: patch > > Case: -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart
rangjiaheng created YARN-7377: - Summary: Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart Key: YARN-7377 URL: https://issues.apache.org/jira/browse/YARN-7377 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager, RM, yarn Affects Versions: 3.0.0-alpha3 Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled; Spark streaming application, a long-running application on yarn Reporter: rangjiaheng -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213695#comment-16213695 ] Hadoop QA commented on YARN-7276: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 33s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7276 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893391/YARN-7276.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f2ff7417a4fa 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 248d9b6 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18068/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18068/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project:
[jira] [Commented] (YARN-7376) YARN top ACLs
[ https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213694#comment-16213694 ] Jonathan Hung commented on YARN-7376: - Fix unit test in 002. Also fix compatibility by setting default ACL to *. > YARN top ACLs > - > > Key: YARN-7376 > URL: https://issues.apache.org/jira/browse/YARN-7376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-7376.001.patch, YARN-7376.002.patch > > > Currently YARN top can be invoked by everyone. But we want to avoid a > scenario where random users invoke YARN top, and potentially leave it > running. So we can implement ACLs to prevent this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7376) YARN top ACLs
[ https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-7376: Attachment: YARN-7376.002.patch > YARN top ACLs > - > > Key: YARN-7376 > URL: https://issues.apache.org/jira/browse/YARN-7376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-7376.001.patch, YARN-7376.002.patch > > > Currently YARN top can be invoked by everyone. But we want to avoid a > scenario where random users invoke YARN top, and potentially leave it > running. So we can implement ACLs to prevent this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7376) YARN top ACLs
[ https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213662#comment-16213662 ] Hadoop QA commented on YARN-7376: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 43s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 17s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7376 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893367/YARN-7376.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 58743e6f5336 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 248d9b6 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18066/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt | |
[jira] [Updated] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-7276: -- Attachment: YARN-7276.005.patch Thanks [~subru] for the comments. I added the multithreaded test in 005. The rest were already done in 004. > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch, YARN-7276.003.patch, YARN-7276.004.patch, > YARN-7276.005.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213645#comment-16213645 ] Hadoop QA commented on YARN-7276: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7276 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893380/YARN-7276.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 06a342439c75 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 248d9b6 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18067/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18067/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project:
[jira] [Commented] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213604#comment-16213604 ] Hadoop QA commented on YARN-7276: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 0s{color} | {color:red} hadoop-yarn-server-router in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.router.webapp.TestRouterWebServicesREST | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7276 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893359/YARN-7276.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 667caa416fc5 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 248d9b6 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18065/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18065/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18065/console | | Powered by | Apache Yetus
[jira] [Updated] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-7276: -- Attachment: YARN-7276.004.patch > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch, YARN-7276.003.patch, YARN-7276.004.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7318) Fix shell check warnings of SLS.
[ https://issues.apache.org/jira/browse/YARN-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213574#comment-16213574 ] Hudson commented on YARN-7318: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13120 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13120/]) YARN-7318. Fix shell check warnings of SLS. (Gergely Novák via wangda) (wangda: rev 281d83604df8341c210cee39bdc745ca793c5afa) * (edit) hadoop-tools/hadoop-sls/src/main/bin/rumen2sls.sh * (edit) hadoop-tools/hadoop-sls/src/main/bin/slsrun.sh > Fix shell check warnings of SLS. > > > Key: YARN-7318 > URL: https://issues.apache.org/jira/browse/YARN-7318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Gergely Novák > Fix For: 3.0.0 > > Attachments: YARN-7318.001.patch > > > Warnings like: > {code} > hadoop-tools/hadoop-sls/src/main/bin/rumen2sls.sh:75:77: warning: args is > referenced but not assigned. [SC2154] > hadoop-tools/hadoop-sls/src/main/bin/slsrun.sh:113:61: warning: args is > referenced but not assigned. [SC2154] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-7351) High CPU usage issue in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7351: Comment: was deleted (was: -1 after applying patch 003, query started failing when it is used in combination with patch for YARN-7326. {code} [yarn@eyang-1 hadoop-3.1.0-SNAPSHOT]$ dig @localhost -p 5353 . ;; Warning: query response not set ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @localhost -p 5353 . ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 48353 ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; Query time: 9 msec ;; SERVER: 127.0.0.1#5353(127.0.0.1) ;; WHEN: Fri Oct 20 19:49:49 UTC 2017 ;; MSG SIZE rcvd: 12 {code} This is because the response payload is bigger than UDP datagram. TCP channel for response is working using the initialized NIOTCPChannel.) > High CPU usage issue in RegistryDNS > --- > > Key: YARN-7351 > URL: https://issues.apache.org/jira/browse/YARN-7351 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-7351.yarn-native-services.01.patch, > YARN-7351.yarn-native-services.02.patch, > YARN-7351.yarn-native-services.03.patch, > YARN-7351.yarn-native-services.03.patch > > > Thanks [~aw] for finding this issue. > The current RegistryDNS implementation is always running on high CPU and > pretty much eats one core. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state
[ https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213538#comment-16213538 ] Eric Yang commented on YARN-7217: - The findbugs warning is not introduced by this JIRA. [~billie.rinaldi] [~jianhe] . Would you mind to take another pass on patch 5? Thank you > Improve API service usability for updating service spec and state > - > > Key: YARN-7217 > URL: https://issues.apache.org/jira/browse/YARN-7217 > Project: Hadoop YARN > Issue Type: Task > Components: api, applications >Reporter: Eric Yang >Assignee: Eric Yang > Attachments: YARN-7217.yarn-native-services.001.patch, > YARN-7217.yarn-native-services.002.patch, > YARN-7217.yarn-native-services.003.patch, > YARN-7217.yarn-native-services.004.patch, > YARN-7217.yarn-native-services.005.patch > > > API service for deploy, and manage YARN services have several limitations. > {{updateService}} API provides multiple functions: > # Stopping a service. > # Start a service. > # Increase or decrease number of containers. (This was removed in YARN-7323). > The overloading is buggy depending on how the configuration should be applied. > h4. Scenario 1 > A user retrieves Service object from getService call, and the Service object > contains state: STARTED. The user would like to increase number of > containers for the deployed service. The JSON has been updated to increase > container count. The PUT method does not actually increase container count. > h4. Scenario 2 > A user retrieves Service object from getService call, and the Service object > contains state: STOPPED. The user would like to make a environment > configuration change. The configuration does not get updated after PUT > method. > This is possible to address by rearranging the logic of START/STOP after > configuration update. However, there are other potential combinations that > can break PUT method. For example, user like to make configuration changes, > but not yet restart the service until a later time. > h4. Scenario 3 > There is no API to list all deployed applications by the same user. > h4. Scenario 4 > Desired state (spec) and current state are represented by the same Service > object. There is no easy way to identify "state" is desired state to reach > or, the current state of the service. It would be nice to have ability to > retrieve both desired state, and current state with separated entry points. > By implementing /spec and /state, it can resolve this problem. > h4. Scenario 5 > List all services deploy by the same user can trigger a directory listing > operation on namenode if hdfs is used as storage for metadata. When hundred > of users use Service UI to view or deploy applications, this will trigger > denial of services attack on namenode. The sparse small metadata files also > reduce efficiency of Namenode memory usage. Hence, a cache layer for storing > service metadata can reduce namenode stress. > h3. Proposed change > ApiService can separate the PUT method into two PUT methods for configuration > changes vs operation changes. New API could look like: > {code} > @PUT > /ws/v1/services/[service_name]/spec > Request Data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "number_of_containers": 2, > "artifact": { > "id": "centos/mysql-57-centos7:latest", > "type": "DOCKER" > }, > "run_privileged_container": false, > "launch_command": "", > "resource": { > "cpus": 1, > "memory": "2048" > }, > "configuration": { > "env": { > "MYSQL_USER":"${USER}", > "MYSQL_PASSWORD":"password" > } > } > } > ], > "quicklinks": { > "Apache Document Root": > "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;, > "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/; > } > } > {code} > {code} > @PUT > /ws/v1/services/[service_name]/state > Request data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "state": "STOPPED" > } > ] > } > {code} > SOLR can be used to cache Yarnfile to improve lookup performance and reduce > stress of namenode small file problems and high frequency lookup. SOLR is > chosen for caching metadata because its indexing feature can be used to build > full text search for application catalog as well. > For service that requires configuration changes to increase or decrease node > count. The calling sequence is: > {code} > # GET /ws/v1/services/{service_name}/spec > # Change number_of_containers to desired number. > # PUT /ws/v1/services/{service_name}/spec to update the spec. > # PUT /ws/v1/services/{service_name}/state to stop existing
[jira] [Commented] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213535#comment-16213535 ] Hadoop QA commented on YARN-7326: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 15s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7326 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893368/YARN-7326.yarn-native-services.003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18064/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch, > YARN-7326.yarn-native-services.003.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7326: Attachment: (was: YARN-7326.yarn-native-services.003.patch) > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch, > YARN-7326.yarn-native-services.003.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7326: Attachment: YARN-7326.yarn-native-services.003.patch > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch, > YARN-7326.yarn-native-services.003.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7376) YARN top ACLs
[ https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213529#comment-16213529 ] Jonathan Hung commented on YARN-7376: - Attached 001 patch which adds {{yarn.top.acl}} for ACLs on client side. [~vvasudev], can you take a look? Thanks! > YARN top ACLs > - > > Key: YARN-7376 > URL: https://issues.apache.org/jira/browse/YARN-7376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-7376.001.patch > > > Currently YARN top can be invoked by everyone. But we want to avoid a > scenario where random users invoke YARN top, and potentially leave it > running. So we can implement ACLs to prevent this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7376) YARN top ACLs
[ https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-7376: Attachment: YARN-7376.001.patch > YARN top ACLs > - > > Key: YARN-7376 > URL: https://issues.apache.org/jira/browse/YARN-7376 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-7376.001.patch > > > Currently YARN top can be invoked by everyone. But we want to avoid a > scenario where random users invoke YARN top, and potentially leave it > running. So we can implement ACLs to prevent this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7376) YARN top ACLs
Jonathan Hung created YARN-7376: --- Summary: YARN top ACLs Key: YARN-7376 URL: https://issues.apache.org/jira/browse/YARN-7376 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung Currently YARN top can be invoked by everyone. But we want to avoid a scenario where random users invoke YARN top, and potentially leave it running. So we can implement ACLs to prevent this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails
[ https://issues.apache.org/jira/browse/YARN-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-7375: Description: Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58) Steps: 1. RM HA is enabled 2. Started a service 3. Active RM failed. 4. Switched to the Web UI of Standby RM 5. Clicked to view the containers of the previous started application and landed to an error page. 6. The NPE mentioned above was found in the standby RM logs was: Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58) > NPE in the RM Webapp when HA is enabled and the active RM fails > --- > > Key: YARN-7375 > URL: https://issues.apache.org/jira/browse/YARN-7375 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chandni Singh > > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280) > at > org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > at > org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) > at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) > at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) > at >
[jira] [Created] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails
Chandni Singh created YARN-7375: --- Summary: NPE in the RM Webapp when HA is enabled and the active RM fails Key: YARN-7375 URL: https://issues.apache.org/jira/browse/YARN-7375 Project: Hadoop YARN Issue Type: Bug Reporter: Chandni Singh Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327) at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6142) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang resolved YARN-6142. -- Resolution: Information Provided Fix Version/s: 3.0.0 Protobuf and JACC analysis done. Will continue rolling upgrade reviews at HDFS-11096. > Support rolling upgrade between 2.x and 3.x > --- > > Key: YARN-6142 > URL: https://issues.apache.org/jira/browse/YARN-6142 > Project: Hadoop YARN > Issue Type: Task > Components: rolling upgrade >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: Ray Chiang >Priority: Blocker > Fix For: 3.0.0 > > > Counterpart JIRA to HDFS-11096. We need to: > * examine YARN and MR's JACC report for binary and source incompatibilities > * run the [PB > differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405] > that Sean wrote for HDFS-11096 for the YARN PBs. > * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are > automated and something we can run upstream. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-7276: -- Attachment: YARN-7276.003.patch > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch, YARN-7276.003.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213458#comment-16213458 ] Ray Chiang commented on YARN-6142: -- Minor issues found by JACC. YARN-2696 - CapacityScheduler#getQueueComparator() split into partitioned/nonparititoned comparator YARN-3139 - Removed synchronized from CapacityScheduler#getContainerTokenSecretManager() - Removed synchronized from CapacityScheduler#getRMContext() - Removed synchronized from CapacityScheduler#setRMContext() YARN-3413 - YarnClient#getClusterNodeLabels() changed return type YARN-3866 - Major refactor in Public APIs for AM-RM for handling container resizing. - Change went into both 2.8.0 and 3.0.0. YARN-3873 - CapacityScheduler#getApplicationComparator() removed YARN-4593 - AbstractService#getConfig() removed synchronized YARN-5077 - Removed SchedulingPolicy#checkIfAMResourceUsageOverLimit() YARN-5221 - AllocateRequest / AllocateResponse has methods changed from Public/Stable to Public/Unstable YARN-5713 - Update jackson affects TimelineUtils#dumpTimelineRecordtoJSON() > Support rolling upgrade between 2.x and 3.x > --- > > Key: YARN-6142 > URL: https://issues.apache.org/jira/browse/YARN-6142 > Project: Hadoop YARN > Issue Type: Task > Components: rolling upgrade >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: Ray Chiang >Priority: Blocker > > Counterpart JIRA to HDFS-11096. We need to: > * examine YARN and MR's JACC report for binary and source incompatibilities > * run the [PB > differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405] > that Sean wrote for HDFS-11096 for the YARN PBs. > * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are > automated and something we can run upstream. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213448#comment-16213448 ] Suma Shivaprasad commented on YARN-7117: Attached a doc depicting the workflow and classes or Auto queue creation and Capacity Management for these queues > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7117: --- Attachment: YARN-7117_Workflow.pdf > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state
[ https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213442#comment-16213442 ] Hadoop QA commented on YARN-7217: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 34 new or modified test files. {color} | || || || || {color:brown} yarn-native-services Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 56s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 58s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 13s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 22s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 18s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in yarn-native-services has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s{color} | {color:green} yarn-native-services passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 4s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 40s{color} | {color:orange} root: The patch generated 10 new + 247 unchanged - 9 fixed = 257 total (was 256) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 14s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 50s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 12s{color} | {color:green} hadoop-yarn-services-api
[jira] [Updated] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7326: Attachment: YARN-7326.yarn-native-services.003.patch Fix error code handling. Some error code was not handled correctly for non existed domain and unauthorized domain. > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch, > YARN-7326.yarn-native-services.003.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213332#comment-16213332 ] Hadoop QA commented on YARN-7374: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 0s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 4s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue | | | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7374 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893323/YARN-7374.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 26172b0bc3eb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky
[ https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213319#comment-16213319 ] Hudson commented on YARN-7372: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13119 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13119/]) YARN-7372. (haibochen: rev 480187aebbc13547af06684820a416d22e7c4649) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > is flaky > > > Key: YARN-7372 > URL: https://issues.apache.org/jira/browse/YARN-7372 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Labels: unit-test > Attachments: YARN-7372.00.patch, YARN-7372.01.patch > > > testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container > to be running before it sends container update request. > The container update is handled asynchronously in node manager, and it does > not trigger visible state transition. If the node manager event > dispatch thread is slow, the unit test can fail at the the assertion > {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, > status.getExecutionType());{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7373) The atomicity of container update in RM is not clear
[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213280#comment-16213280 ] Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM: - [~haibochen] / [~miklos.szeg...@cloudera.com] So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because: # the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. # The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there # It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized. # It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that. Although - I do have to mention, that the code you pasted above - which is part of the changes in YARN-4511 can cause a few problems, since you are updating the node as well, and you might need a lock on the node before you do that. was (Author: asuresh): [~haibochen] / [~miklos.szeg...@cloudera.com] So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because: # the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. # The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there # It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized. # It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that. > The atomicity of container update in RM is not clear > > > Key: YARN-7373 > URL: https://issues.apache.org/jira/browse/YARN-7373 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > While reviewing YARN-4511, Miklos noticed that > {code:java} > 342 // notify schedulerNode of the update to correct resource accounting > 343 node.containerUpdated(existingRMContainer, existingContainer); > 344 > 345 > ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); > 346 // notify SchedulerNode of the update to correct resource accounting > 347 node.containerUpdated(tempRMContainer, tempContainer); > 348 > {code} > bq. I think that it would be nicer to lock around these two calls to become > atomic. > Container update, and thus container swap as part of that, is atomic > according to [~asuresh]. > It'd be nice to discuss this in more details to see if we want to be more > conservative. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7373) The atomicity of container update in RM is not clear
[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213280#comment-16213280 ] Arun Suresh commented on YARN-7373: --- [~haibochen] / [~miklos.szeg...@cloudera.com] So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because: # the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. # The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there # It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized. # It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that. > The atomicity of container update in RM is not clear > > > Key: YARN-7373 > URL: https://issues.apache.org/jira/browse/YARN-7373 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > While reviewing YARN-4511, Miklos noticed that > {code:java} > 342 // notify schedulerNode of the update to correct resource accounting > 343 node.containerUpdated(existingRMContainer, existingContainer); > 344 > 345 > ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); > 346 // notify SchedulerNode of the update to correct resource accounting > 347 node.containerUpdated(tempRMContainer, tempContainer); > 348 > {code} > bq. I think that it would be nicer to lock around these two calls to become > atomic. > Container update, and thus container swap as part of that, is atomic > according to [~asuresh]. > It'd be nice to discuss this in more details to see if we want to be more > conservative. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky
[ https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213265#comment-16213265 ] Haibo Chen commented on YARN-7372: -- Thanks [~asuresh] for the review! Will check it in shortly. > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > is flaky > > > Key: YARN-7372 > URL: https://issues.apache.org/jira/browse/YARN-7372 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Labels: unit-test > Attachments: YARN-7372.00.patch, YARN-7372.01.patch > > > testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container > to be running before it sends container update request. > The container update is handled asynchronously in node manager, and it does > not trigger visible state transition. If the node manager event > dispatch thread is slow, the unit test can fail at the the assertion > {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, > status.getExecutionType());{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213254#comment-16213254 ] Yufei Gu commented on YARN-7374: Thanks for working on this [~templedf]. The patch looks good to me generally. Some nits: - Would you like to publish the performance comparison result? - remove "* @param n the number of resource types" for method {{compare2()}} - Two empty lines before "// A queue is needy for its min share if its dominant resource". - Code would be cleaner if putting method {{compare2}} and its support methods to a separated class. - Maybe a good idea to add comment to indicate how to get non-dominate index, or a new method like {{getNonDominateIndex(int dominant) { return 1 - dominant}}}. - Could these code be put into separated method? Since it is invoked several times. {code} if (res == 0) { // Apps are tied in fairness ratio. Break the tie by submit time and job // name to get a deterministic ordering, which is useful for unit tests. res = (int) Math.signum(s1.getStartTime() - s2.getStartTime()); if (res == 0) { res = s1.getName().compareTo(s2.getName()); } } {code} > Improve performance of DRF comparisons for resource types in fair scheduler > --- > > Key: YARN-7374 > URL: https://issues.apache.org/jira/browse/YARN-7374 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-7374.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213251#comment-16213251 ] Ray Chiang commented on YARN-6142: -- I'm done with the JACC analysis, but need to do the same type of writeup that was done for protobuf. The quick answer is that we don't have any major red flags, but I'm going to note some potential incompatibilities that are very minor, but could affect some random API user out there. > Support rolling upgrade between 2.x and 3.x > --- > > Key: YARN-6142 > URL: https://issues.apache.org/jira/browse/YARN-6142 > Project: Hadoop YARN > Issue Type: Task > Components: rolling upgrade >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: Ray Chiang >Priority: Blocker > > Counterpart JIRA to HDFS-11096. We need to: > * examine YARN and MR's JACC report for binary and source incompatibilities > * run the [PB > differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405] > that Sean wrote for HDFS-11096 for the YARN PBs. > * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are > automated and something we can run upstream. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7170) Improve bower dependencies for YARN UI v2
[ https://issues.apache.org/jira/browse/YARN-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-7170: - Fix Version/s: 2.9.0 > Improve bower dependencies for YARN UI v2 > - > > Key: YARN-7170 > URL: https://issues.apache.org/jira/browse/YARN-7170 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.9.0, 3.0.0 > > Attachments: YARN-7170.001.patch, YARN-7170.002.patch > > > [INFO] bower ember#2.2.0 progress Receiving > objects: 50% (38449/75444), 722.46 MiB | 3.30 MiB/s > ... > [INFO] bower ember#2.2.0 progress Receiving > objects: 99% (75017/75444), 1.56 GiB | 3.31 MiB/s > Investigate the dependencies and reduce the download size and speed of > compilation. > cc/ [~Sreenath] and [~akhilpb] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7338) Support same origin policy for cross site scripting prevention.
[ https://issues.apache.org/jira/browse/YARN-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-7338: - Fix Version/s: 2.9.0 > Support same origin policy for cross site scripting prevention. > --- > > Key: YARN-7338 > URL: https://issues.apache.org/jira/browse/YARN-7338 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Vrushali C >Assignee: Sunil G > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7338.001.patch > > > Opening jira as suggested b [~eyang] on the thread for merging YARN-3368 (new > web UI) to branch2 > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201610.mbox/%3ccad++ecmvvqnzqz9ynkvkcxaczdkg50yiofxktgk3mmms9sh...@mail.gmail.com%3E > -- > Ui2 does not seem to support same origin policy for cross site scripting > prevention. > The following parameters has no effect for /ui2: > hadoop.http.cross-origin.enabled = true > yarn.resourcemanager.webapp.cross-origin.enabled = true > This is because ui2 is designed as a separate web application. WebFilters > setup for existing resource manager doesn’t apply to the new web application. > Please open JIRA to track the security issue and resolve the problem prior to > backporting this to branch-2. > This would minimize the risk to open up security hole in branch-2. > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky
[ https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213215#comment-16213215 ] Arun Suresh commented on YARN-7372: --- +1, Thanks [~haibochen] > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > is flaky > > > Key: YARN-7372 > URL: https://issues.apache.org/jira/browse/YARN-7372 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Labels: unit-test > Attachments: YARN-7372.00.patch, YARN-7372.01.patch > > > testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container > to be running before it sends container update request. > The container update is handled asynchronously in node manager, and it does > not trigger visible state transition. If the node manager event > dispatch thread is slow, the unit test can fail at the the assertion > {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, > status.getExecutionType());{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213197#comment-16213197 ] Subru Krishnan commented on YARN-7276: -- Thanks [~elgoiri] for the fixes. I looked at it and is mostly good, minor comments below: * DefaultMetricsSystem initialization seems to be missing in the patch. * Add tests to check empty states and labels? * Would it be possible to have a multi-threaded test? * Nit: {{FederationInterceptorREST::getCopy}} --> {{FederationInterceptorREST::Clone}} and mention in the comment that this is for thread safeness. > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky
[ https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213194#comment-16213194 ] Hadoop QA commented on YARN-7372: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 15s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7372 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893295/YARN-7372.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 118a676353a1 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6b7c87c | | Default Java | 1.8.0_131 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18058/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18058/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18058/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. >
[jira] [Assigned] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan reassigned YARN-7276: Assignee: Íñigo Goiri (was: Giovanni Matteo Fumarola) > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213177#comment-16213177 ] Andrew Wang commented on YARN-6142: --- Hi Ray, what is left to do here? Is it tracking towards completion by the end of the month? > Support rolling upgrade between 2.x and 3.x > --- > > Key: YARN-6142 > URL: https://issues.apache.org/jira/browse/YARN-6142 > Project: Hadoop YARN > Issue Type: Task > Components: rolling upgrade >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: Ray Chiang >Priority: Blocker > > Counterpart JIRA to HDFS-11096. We need to: > * examine YARN and MR's JACC report for binary and source incompatibilities > * run the [PB > differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405] > that Sean wrote for HDFS-11096 for the YARN PBs. > * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are > automated and something we can run upstream. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213173#comment-16213173 ] Hadoop QA commented on YARN-7276: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7276 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893303/YARN-7276.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 67718d83eb09 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6b7c87c | | Default Java | 1.8.0_131 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18059/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18059/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Giovanni Matteo Fumarola >
[jira] [Commented] (YARN-7178) Add documentation for Container Update API
[ https://issues.apache.org/jira/browse/YARN-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213164#comment-16213164 ] Andrew Wang commented on YARN-7178: --- Ping, is this one tracking towards completion by the end of the month? It's marked as a blocker. > Add documentation for Container Update API > -- > > Key: YARN-7178 > URL: https://issues.apache.org/jira/browse/YARN-7178 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Blocker > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213160#comment-16213160 ] Haibo Chen commented on YARN-7355: -- Thanks @Yufei for the review! > TestDistributedShell should be scheduler agnostic > -- > > Key: YARN-7355 > URL: https://issues.apache.org/jira/browse/YARN-7355 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7355.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7351) High CPU usage issue in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212937#comment-16212937 ] Eric Yang edited comment on YARN-7351 at 10/20/17 7:55 PM: --- -1 after applying patch 003, query started failing when it is used in combination with patch for YARN-7326. {code} [yarn@eyang-1 hadoop-3.1.0-SNAPSHOT]$ dig @localhost -p 5353 . ;; Warning: query response not set ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @localhost -p 5353 . ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 48353 ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; Query time: 9 msec ;; SERVER: 127.0.0.1#5353(127.0.0.1) ;; WHEN: Fri Oct 20 19:49:49 UTC 2017 ;; MSG SIZE rcvd: 12 {code} This is because the response payload is bigger than UDP datagram. TCP channel for response is working using the initialized NIOTCPChannel. was (Author: eyang): +1 for disabling TCP channel for now. > High CPU usage issue in RegistryDNS > --- > > Key: YARN-7351 > URL: https://issues.apache.org/jira/browse/YARN-7351 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-7351.yarn-native-services.01.patch, > YARN-7351.yarn-native-services.02.patch, > YARN-7351.yarn-native-services.03.patch, > YARN-7351.yarn-native-services.03.patch > > > Thanks [~aw] for finding this issue. > The current RegistryDNS implementation is always running on high CPU and > pretty much eats one core. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213119#comment-16213119 ] Hudson commented on YARN-7355: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13118/]) YARN-7355. TestDistributedShell should be scheduler agnostic. (yufei: rev 6b7c87c94592606966a4229313b3d0da48f16158) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > TestDistributedShell should be scheduler agnostic > -- > > Key: YARN-7355 > URL: https://issues.apache.org/jira/browse/YARN-7355 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7355.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories
[ https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213117#comment-16213117 ] Hudson commented on YARN-7353: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13118/]) YARN-7353. Improved volume mount check for directories and unit test (eyang: rev b61144a93d9306624378a93944d0a08c60436554) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc > Docker permitted volumes don't properly check for directories > - > > Key: YARN-7353 > URL: https://issues.apache.org/jira/browse/YARN-7353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-7353.001.patch, YARN-7353.002.patch, > YARN-7353.003.patch > > > {noformat:title=docker-util.c:check_mount_permitted()} > // directory check > permitted_mount_len = strlen(permitted_mounts[i]); > if (permitted_mount_len > 0 > && permitted_mounts[i][permitted_mount_len - 1] == '/') { > if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) > == 0) { > ret = 1; > break; > } > } > {noformat} > This code will treat "/home/" as a directory, but not "/home" > {noformat} > [ FAILED ] 3 tests, listed below: > [ FAILED ] TestDockerUtil.test_check_mount_permitted > [ FAILED ] TestDockerUtil.test_normalize_mounts > [ FAILED ] TestDockerUtil.test_add_rw_mounts > {noformat} > Additionally, YARN-6623 introduced new test failures in the C++ > container-executor test "cetest" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7261) Add debug message for better download latency monitoring
[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213118#comment-16213118 ] Hudson commented on YARN-7261: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13118/]) YARN-7261. Add debug message for better download latency monitoring. (yufei: rev 0799fde35e7f3b9e8a85284ac0b30f6bdcbffad1) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java > Add debug message for better download latency monitoring > > > Key: YARN-7261 > URL: https://issues.apache.org/jira/browse/YARN-7261 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7261.001.patch, YARN-7261.002.patch, > YARN-7261.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-7374: --- Attachment: YARN-7374.001.patch > Improve performance of DRF comparisons for resource types in fair scheduler > --- > > Key: YARN-7374 > URL: https://issues.apache.org/jira/browse/YARN-7374 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-7374.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler
Daniel Templeton created YARN-7374: -- Summary: Improve performance of DRF comparisons for resource types in fair scheduler Key: YARN-7374 URL: https://issues.apache.org/jira/browse/YARN-7374 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Affects Versions: 3.1.0 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Critical -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations
[ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213028#comment-16213028 ] Haibo Chen commented on YARN-4511: -- YARN-7373 is created for the container update atomicity discussion. > Common scheduler changes supporting scheduler-specific implementations > -- > > Key: YARN-4511 > URL: https://issues.apache.org/jira/browse/YARN-4511 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Haibo Chen > Attachments: YARN-4511-YARN-1011.00.patch, > YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, > YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, > YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7373) The atomicity of container update in RM is not clear
[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213026#comment-16213026 ] Haibo Chen commented on YARN-7373: -- [~asuresh] Can you please provide some background and details of container update? The atomicity is not clear to us in term of how it is guaranteed. Our concern is that another container allocation may come in between the two containerUpdated() call and there is not enough resource available for the allocation. > The atomicity of container update in RM is not clear > > > Key: YARN-7373 > URL: https://issues.apache.org/jira/browse/YARN-7373 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > While reviewing YARN-4511, Miklos noticed that > {code:java} > 342 // notify schedulerNode of the update to correct resource accounting > 343 node.containerUpdated(existingRMContainer, existingContainer); > 344 > 345 > ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); > 346 // notify SchedulerNode of the update to correct resource accounting > 347 node.containerUpdated(tempRMContainer, tempContainer); > 348 > {code} > bq. I think that it would be nicer to lock around these two calls to become > atomic. > Container update, and thus container swap as part of that, is atomic > according to [~asuresh]. > It'd be nice to discuss this in more details to see if we want to be more > conservative. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7373) The atomicity of container update in RM is not clear
[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-7373: - Description: While reviewing YARN-4511, Miklos noticed that {code:java} 342 // notify schedulerNode of the update to correct resource accounting 343 node.containerUpdated(existingRMContainer, existingContainer); 344 345 ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); 346 // notify SchedulerNode of the update to correct resource accounting 347 node.containerUpdated(tempRMContainer, tempContainer); 348 {code} bq. I think that it would be nicer to lock around these two calls to become atomic. Container update, and thus container swap as part of that, is atomic according to [~asuresh]. It'd be nice to discuss this in more details to see if we want to be more conservative. was: While reviewing YARN-4511, Miklos pointed out that {code:java} 342 // notify schedulerNode of the update to correct resource accounting 343 node.containerUpdated(existingRMContainer, existingContainer); 344 345 ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); 346 // notify SchedulerNode of the update to correct resource accounting 347 node.containerUpdated(tempRMContainer, tempContainer); 348 {code} bq. I think that it would be nicer to lock around these two calls to become atomic. > The atomicity of container update in RM is not clear > > > Key: YARN-7373 > URL: https://issues.apache.org/jira/browse/YARN-7373 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > > While reviewing YARN-4511, Miklos noticed that > {code:java} > 342 // notify schedulerNode of the update to correct resource accounting > 343 node.containerUpdated(existingRMContainer, existingContainer); > 344 > 345 > ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); > 346 // notify SchedulerNode of the update to correct resource accounting > 347 node.containerUpdated(tempRMContainer, tempContainer); > 348 > {code} > bq. I think that it would be nicer to lock around these two calls to become > atomic. > Container update, and thus container swap as part of that, is atomic > according to [~asuresh]. > It'd be nice to discuss this in more details to see if we want to be more > conservative. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7370) Intra-queue preemption properties should be refreshable
[ https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213017#comment-16213017 ] Wangda Tan commented on YARN-7370: -- [~GergelyNovak], you're very much welcome to take up this task, this is quite helpful and important for user to use preemption. I agree with Eric, it's better to include this in {{-refreshQueues}} op so we don't need any changes to RMAdmin protocol and CLI. To me the requirement is: - Handle changes to {{SchedulingEditPolicy}} configs including preemption (which means {{SchedulingMonitor}} should be refreshable as well). - All preemption-related parameters. [~eepayne]/[~sunilg], please feel free to add any requirement in your mind. > Intra-queue preemption properties should be refreshable > --- > > Key: YARN-7370 > URL: https://issues.apache.org/jira/browse/YARN-7370 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, scheduler preemption >Affects Versions: 2.8.0, 3.0.0-alpha3 >Reporter: Eric Payne > > At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} > should be refreshable. It would also be nice to make > {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} > refreshable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7373) The atomicity of container update in RM is not clear
Haibo Chen created YARN-7373: Summary: The atomicity of container update in RM is not clear Key: YARN-7373 URL: https://issues.apache.org/jira/browse/YARN-7373 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Haibo Chen Assignee: Haibo Chen While reviewing YARN-4511, Miklos pointed out that {code:java} 342 // notify schedulerNode of the update to correct resource accounting 343 node.containerUpdated(existingRMContainer, existingContainer); 344 345 ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); 346 // notify SchedulerNode of the update to correct resource accounting 347 node.containerUpdated(tempRMContainer, tempContainer); 348 {code} bq. I think that it would be nicer to lock around these two calls to become atomic. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations
[ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213013#comment-16213013 ] Haibo Chen commented on YARN-4511: -- bq. If containerResourceAllocated fails in guaranteedContainerResourceAllocated we will still call allocatedContainers.put(). I think this may cause some inconsistencies in the future. Probably it is better to propagate the false return code all the way to the caller. bq. guaranteedContainerResourceReleased may fail inside but regardless of the outcome, we decrease numGuaranteedContainers. These two are the current behavior without the patch. The resource release can fail only if resource is null, in which case is equivalent to releasing a zero-sized container, but it won't cause any inconsistency. bq. I think that it would be nicer to lock around these two calls to become atomic. That's a valid concern. container update and thus swap is atomic according to [~asuresh]. But that is indeed not very clear. Let's discuss this in another jira to see if we can improve it. Will address the rest of your comments in the next patch plus unit tests. > Common scheduler changes supporting scheduler-specific implementations > -- > > Key: YARN-4511 > URL: https://issues.apache.org/jira/browse/YARN-4511 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Haibo Chen > Attachments: YARN-4511-YARN-1011.00.patch, > YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, > YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, > YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls
[ https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213011#comment-16213011 ] Jason Lowe commented on YARN-4163: -- Thanks for updating the patch! +1 lgtm. > Audit getQueueInfo and getApplications calls > > > Key: YARN-4163 > URL: https://issues.apache.org/jira/browse/YARN-4163 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4163.004.patch, YARN-4163.005.patch, > YARN-4163.006.branch-2.8.patch, YARN-4163.006.patch, YARN-4163.007.patch, > YARN-4163.2.patch, YARN-4163.2.patch, YARN-4163.3.patch, YARN-4163.patch > > > getQueueInfo and getApplications seem to sometimes cause spike of load but > not able to confirm due to they are not audit logged. This patch propose to > add them to audit log -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7217) Improve API service usability for updating service spec and state
[ https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213006#comment-16213006 ] Eric Yang edited comment on YARN-7217 at 10/20/17 6:24 PM: --- - Fixed actionBuild to deploy config to solr. - Fixed PUT method for state for service to be in sync with code from HEAD of yarn-native-services. - Fixed Solr version definition in hadoop-project/pom.xml was (Author: eyang): - Fixed actionBuild to deploy config to solr. - Fixed PUT method for state for service to be in sync with code from HEAD of yarn-native-services. > Improve API service usability for updating service spec and state > - > > Key: YARN-7217 > URL: https://issues.apache.org/jira/browse/YARN-7217 > Project: Hadoop YARN > Issue Type: Task > Components: api, applications >Reporter: Eric Yang >Assignee: Eric Yang > Attachments: YARN-7217.yarn-native-services.001.patch, > YARN-7217.yarn-native-services.002.patch, > YARN-7217.yarn-native-services.003.patch, > YARN-7217.yarn-native-services.004.patch, > YARN-7217.yarn-native-services.005.patch > > > API service for deploy, and manage YARN services have several limitations. > {{updateService}} API provides multiple functions: > # Stopping a service. > # Start a service. > # Increase or decrease number of containers. (This was removed in YARN-7323). > The overloading is buggy depending on how the configuration should be applied. > h4. Scenario 1 > A user retrieves Service object from getService call, and the Service object > contains state: STARTED. The user would like to increase number of > containers for the deployed service. The JSON has been updated to increase > container count. The PUT method does not actually increase container count. > h4. Scenario 2 > A user retrieves Service object from getService call, and the Service object > contains state: STOPPED. The user would like to make a environment > configuration change. The configuration does not get updated after PUT > method. > This is possible to address by rearranging the logic of START/STOP after > configuration update. However, there are other potential combinations that > can break PUT method. For example, user like to make configuration changes, > but not yet restart the service until a later time. > h4. Scenario 3 > There is no API to list all deployed applications by the same user. > h4. Scenario 4 > Desired state (spec) and current state are represented by the same Service > object. There is no easy way to identify "state" is desired state to reach > or, the current state of the service. It would be nice to have ability to > retrieve both desired state, and current state with separated entry points. > By implementing /spec and /state, it can resolve this problem. > h4. Scenario 5 > List all services deploy by the same user can trigger a directory listing > operation on namenode if hdfs is used as storage for metadata. When hundred > of users use Service UI to view or deploy applications, this will trigger > denial of services attack on namenode. The sparse small metadata files also > reduce efficiency of Namenode memory usage. Hence, a cache layer for storing > service metadata can reduce namenode stress. > h3. Proposed change > ApiService can separate the PUT method into two PUT methods for configuration > changes vs operation changes. New API could look like: > {code} > @PUT > /ws/v1/services/[service_name]/spec > Request Data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "number_of_containers": 2, > "artifact": { > "id": "centos/mysql-57-centos7:latest", > "type": "DOCKER" > }, > "run_privileged_container": false, > "launch_command": "", > "resource": { > "cpus": 1, > "memory": "2048" > }, > "configuration": { > "env": { > "MYSQL_USER":"${USER}", > "MYSQL_PASSWORD":"password" > } > } > } > ], > "quicklinks": { > "Apache Document Root": > "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;, > "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/; > } > } > {code} > {code} > @PUT > /ws/v1/services/[service_name]/state > Request data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "state": "STOPPED" > } > ] > } > {code} > SOLR can be used to cache Yarnfile to improve lookup performance and reduce > stress of namenode small file problems and high frequency lookup. SOLR is > chosen for caching metadata because its indexing feature can be used to build > full text search for application catalog as well. > For service that requires configuration changes to increase or
[jira] [Updated] (YARN-7217) Improve API service usability for updating service spec and state
[ https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7217: Attachment: YARN-7217.yarn-native-services.005.patch - Fixed actionBuild to deploy config to solr. - Fixed PUT method for state for service to be in sync with code from HEAD of yarn-native-services. > Improve API service usability for updating service spec and state > - > > Key: YARN-7217 > URL: https://issues.apache.org/jira/browse/YARN-7217 > Project: Hadoop YARN > Issue Type: Task > Components: api, applications >Reporter: Eric Yang >Assignee: Eric Yang > Attachments: YARN-7217.yarn-native-services.001.patch, > YARN-7217.yarn-native-services.002.patch, > YARN-7217.yarn-native-services.003.patch, > YARN-7217.yarn-native-services.004.patch, > YARN-7217.yarn-native-services.005.patch > > > API service for deploy, and manage YARN services have several limitations. > {{updateService}} API provides multiple functions: > # Stopping a service. > # Start a service. > # Increase or decrease number of containers. (This was removed in YARN-7323). > The overloading is buggy depending on how the configuration should be applied. > h4. Scenario 1 > A user retrieves Service object from getService call, and the Service object > contains state: STARTED. The user would like to increase number of > containers for the deployed service. The JSON has been updated to increase > container count. The PUT method does not actually increase container count. > h4. Scenario 2 > A user retrieves Service object from getService call, and the Service object > contains state: STOPPED. The user would like to make a environment > configuration change. The configuration does not get updated after PUT > method. > This is possible to address by rearranging the logic of START/STOP after > configuration update. However, there are other potential combinations that > can break PUT method. For example, user like to make configuration changes, > but not yet restart the service until a later time. > h4. Scenario 3 > There is no API to list all deployed applications by the same user. > h4. Scenario 4 > Desired state (spec) and current state are represented by the same Service > object. There is no easy way to identify "state" is desired state to reach > or, the current state of the service. It would be nice to have ability to > retrieve both desired state, and current state with separated entry points. > By implementing /spec and /state, it can resolve this problem. > h4. Scenario 5 > List all services deploy by the same user can trigger a directory listing > operation on namenode if hdfs is used as storage for metadata. When hundred > of users use Service UI to view or deploy applications, this will trigger > denial of services attack on namenode. The sparse small metadata files also > reduce efficiency of Namenode memory usage. Hence, a cache layer for storing > service metadata can reduce namenode stress. > h3. Proposed change > ApiService can separate the PUT method into two PUT methods for configuration > changes vs operation changes. New API could look like: > {code} > @PUT > /ws/v1/services/[service_name]/spec > Request Data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "number_of_containers": 2, > "artifact": { > "id": "centos/mysql-57-centos7:latest", > "type": "DOCKER" > }, > "run_privileged_container": false, > "launch_command": "", > "resource": { > "cpus": 1, > "memory": "2048" > }, > "configuration": { > "env": { > "MYSQL_USER":"${USER}", > "MYSQL_PASSWORD":"password" > } > } > } > ], > "quicklinks": { > "Apache Document Root": > "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;, > "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/; > } > } > {code} > {code} > @PUT > /ws/v1/services/[service_name]/state > Request data: > { > "name": "amp", > "components": [ > { > "name": "mysql", > "state": "STOPPED" > } > ] > } > {code} > SOLR can be used to cache Yarnfile to improve lookup performance and reduce > stress of namenode small file problems and high frequency lookup. SOLR is > chosen for caching metadata because its indexing feature can be used to build > full text search for application catalog as well. > For service that requires configuration changes to increase or decrease node > count. The calling sequence is: > {code} > # GET /ws/v1/services/{service_name}/spec > # Change number_of_containers to desired number. > # PUT /ws/v1/services/{service_name}/spec to update the spec. > # PUT /ws/v1/services/{service_name}/state to
[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212989#comment-16212989 ] Yufei Gu commented on YARN-7355: +1. > TestDistributedShell should be scheduler agnostic > -- > > Key: YARN-7355 > URL: https://issues.apache.org/jira/browse/YARN-7355 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7355.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)
[ https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212988#comment-16212988 ] Vrushali C commented on YARN-7169: -- Looking at the last two builds, I think things are looking good for the patch. The HDFS test timeouts are unrelated. I will proceed with the merge to branch-2 > Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2) > -- > > Key: YARN-7169 > URL: https://issues.apache.org/jira/browse/YARN-7169 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Critical > Attachments: FlowRunDetails_Sleepjob.png, Metrics_Yarn_UI.png, > YARN-7169-YARN-3368_branch2.0001.patch, > YARN-7169-YARN-5355_branch2.0001.patch, > YARN-7169-YARN-5355_branch2.0002.patch, > YARN-7169-YARN-5355_branch2.0003.patch, > YARN-7169-YARN-5355_branch2.0004.patch, > YARN-7169-YARN-5355_branch2.0004.patch, YARN-7169-branch-2.0001.patch, > YARN-7169-branch-2.0002.patch, ui_commits(1) > > > Jira to track the backport of the new yarn-ui onto branch2. Right now adding > into Timeline Service v2's branch2 which is YARN-5355_branch2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories
[ https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212977#comment-16212977 ] Eric Badger commented on YARN-7353: --- Thanks, [~eyang]! > Docker permitted volumes don't properly check for directories > - > > Key: YARN-7353 > URL: https://issues.apache.org/jira/browse/YARN-7353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-7353.001.patch, YARN-7353.002.patch, > YARN-7353.003.patch > > > {noformat:title=docker-util.c:check_mount_permitted()} > // directory check > permitted_mount_len = strlen(permitted_mounts[i]); > if (permitted_mount_len > 0 > && permitted_mounts[i][permitted_mount_len - 1] == '/') { > if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) > == 0) { > ret = 1; > break; > } > } > {noformat} > This code will treat "/home/" as a directory, but not "/home" > {noformat} > [ FAILED ] 3 tests, listed below: > [ FAILED ] TestDockerUtil.test_check_mount_permitted > [ FAILED ] TestDockerUtil.test_normalize_mounts > [ FAILED ] TestDockerUtil.test_add_rw_mounts > {noformat} > Additionally, YARN-6623 introduced new test failures in the C++ > container-executor test "cetest" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations
[ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212971#comment-16212971 ] Haibo Chen commented on YARN-4511: -- bq. however we need to make sure it reflects the state of the object, so for example allocateContainer() should set this value as the last step after the allocatedContainers.put() call. bq. containerResourceReleased should decrease resourceAllocatedPendingLaunch, if the container has not been started, yet. Good points, will address in the next patch. bq. I think that it would be nicer to lock around these two calls to become atomic. swapContainer() is already protected in a writeLock, so it is already atomic, no? bq. isValidGuaranteedContainer and isValidOpportunisticContainer contain the same code. Should they be different? I'm inclined to keep both of them. The caller may want to check whether it is a guaranteed or opportunistic, not just whether it has been allocated on the node It just so happens that we are sharing the same map for both OPPORTUNISTIC and GUARANTEED containers, hence the code is identical. I'll add Execution Type check to be more rigorous. bq. allocatedContainers.remove(containerId); can be placed outside the if. {code:java} if (container.getExecutionType() == ExecutionType.GUARANTEED) { guaranteedContainerResourceReleased(container); numGuaranteedContainers--; } else { opportunisticContainerResourceReleased(container); numOpportunisticContainers--; } allocatedContainers.remove(containerId); {code} The above code will update the num*Containers counter before allocatedContainers is updated, so I think we should keep it as it. > Common scheduler changes supporting scheduler-specific implementations > -- > > Key: YARN-4511 > URL: https://issues.apache.org/jira/browse/YARN-4511 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Haibo Chen > Attachments: YARN-4511-YARN-1011.00.patch, > YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, > YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, > YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212967#comment-16212967 ] Eric Yang commented on YARN-7326: - [~jianhe] I will add comments for updateDNSServer method to describe what it does. For testing, try: {code} dig @localhost -p 5353 . dig @localhost -p 5353 google.com. {code} > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7326) Some issues in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212950#comment-16212950 ] Jian He commented on YARN-7326: --- [~eyang], I'm not familiar with the JAVA DNS libs, could you add some comments in the code to explain what the new method is doing ?, like the updateDNSServer method. It'll be useful for people who aren't familiar with these libs to understand. And how can I test this this change ? > Some issues in RegistryDNS > -- > > Key: YARN-7326 > URL: https://issues.apache.org/jira/browse/YARN-7326 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Eric Yang > Attachments: YARN-7326.yarn-native-services.001.patch, > YARN-7326.yarn-native-services.002.patch > > > [~aw] helped to identify these issues: > Now some general bad news, not related to this patch: > Ran a few queries, but this one is a bit concerning: > {code} > root@ubuntu:/hadoop/logs# dig @localhost -p 54 . > ;; Warning: query response not set > ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 . > ; (2 servers found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794 > ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 > ;; WARNING: recursion requested but not available > ;; Query time: 0 msec > ;; SERVER: 127.0.0.1#54(127.0.0.1) > ;; WHEN: Thu Oct 12 16:04:54 PDT 2017 > ;; MSG SIZE rcvd: 12 > root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr . > ;; Connection to ::1#54(::1) for . failed: connection refused. > ;; communications error to 127.0.0.1#54: end of file > root@ubuntu:/hadoop/logs# > {code} > It looks like it effectively fails when asked about a root zone, which is bad. > It's also kind of interesting in what it does and doesn't log. Probably > should be configured to rotate logs based on size not date. > The real showstopper though: RegistryDNS basically eats a core. It is running > with 100% cpu utilization with and without jsvc. On my laptop, this is > triggering my fan. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)
[ https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212943#comment-16212943 ] Hadoop QA commented on YARN-7169: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 22s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 12s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 17s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn . hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 45s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 28s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 9m 22s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 9m 22s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 1s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 10s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 43s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ha.TestZKFailoverController | | Timed out junit tests | org.apache.hadoop.http.TestHttpServer | | | org.apache.hadoop.log.TestLogLevel | \\ \\ || Subsystem || Report/Notes || | Docker
[jira] [Commented] (YARN-7351) High CPU usage issue in RegistryDNS
[ https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212937#comment-16212937 ] Eric Yang commented on YARN-7351: - +1 for disabling TCP channel for now. > High CPU usage issue in RegistryDNS > --- > > Key: YARN-7351 > URL: https://issues.apache.org/jira/browse/YARN-7351 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-7351.yarn-native-services.01.patch, > YARN-7351.yarn-native-services.02.patch, > YARN-7351.yarn-native-services.03.patch, > YARN-7351.yarn-native-services.03.patch > > > Thanks [~aw] for finding this issue. > The current RegistryDNS implementation is always running on high CPU and > pretty much eats one core. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7243) Moving logging APIs over to slf4j in hadoop-yarn-server-resourcemanager
[ https://issues.apache.org/jira/browse/YARN-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212922#comment-16212922 ] Hadoop QA commented on YARN-7243: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 65 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 31s{color} | {color:green} root generated 0 new + 1251 unchanged - 5 fixed = 1251 total (was 1256) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 57s{color} | {color:orange} root: The patch generated 12 new + 3750 unchanged - 29 fixed = 3762 total (was 3779) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}174m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7243 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893180/YARN-7243.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c2649948c163 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git
[jira] [Commented] (YARN-7261) Add debug message for better download latency monitoring
[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212892#comment-16212892 ] Yufei Gu commented on YARN-7261: Thanks for the review, [~xiaochen]. Committed to trunk, branch-3.0 and branch-2. > Add debug message for better download latency monitoring > > > Key: YARN-7261 > URL: https://issues.apache.org/jira/browse/YARN-7261 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.9.0, 3.0.0, 3.1.0 > > Attachments: YARN-7261.001.patch, YARN-7261.002.patch, > YARN-7261.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7276) Federation Router Web Service fixes
[ https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-7276: -- Attachment: YARN-7276.002.patch > Federation Router Web Service fixes > --- > > Key: YARN-7276 > URL: https://issues.apache.org/jira/browse/YARN-7276 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-7276.000.patch, YARN-7276.001.patch, > YARN-7276.002.patch > > > While testing YARN-3661, I found a few issues with the REST interface in the > Router: > * No support for empty content (error 204) > * Media type support > * Attributes in {{FederationInterceptorREST}} > * Support for empty states and labels > * DefaultMetricsSystem initialization is missing -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7261) Add debug message for better download latency monitoring
[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-7261: --- Summary: Add debug message for better download latency monitoring (was: Add debug message in class FSDownload for better download latency monitoring) > Add debug message for better download latency monitoring > > > Key: YARN-7261 > URL: https://issues.apache.org/jira/browse/YARN-7261 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-7261.001.patch, YARN-7261.002.patch, > YARN-7261.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7261) Add debug message in class FSDownload for better download latency monitoring
[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212874#comment-16212874 ] Xiao Chen commented on YARN-7261: - +1 on patch 3, thanks Yufei! > Add debug message in class FSDownload for better download latency monitoring > > > Key: YARN-7261 > URL: https://issues.apache.org/jira/browse/YARN-7261 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-7261.001.patch, YARN-7261.002.patch, > YARN-7261.003.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7357) Several methods in TestZKRMStateStore.TestZKRMStateStoreTester.TestZKRMStateStoreInternal should have @Override annotations
[ https://issues.apache.org/jira/browse/YARN-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212856#comment-16212856 ] Hadoop QA commented on YARN-7357: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 49s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue | | | hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7357 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893169/YARN-7357.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a6ecef593760 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f4cdf1 | | Default Java | 1.8.0_131 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18057/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18057/testReport/ | | modules | C:
[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations
[ https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212848#comment-16212848 ] Miklos Szegedi commented on YARN-4511: -- Thank you, [~haibochen] for the patch. {code} 342 // notify schedulerNode of the update to correct resource accounting 343 node.containerUpdated(existingRMContainer, existingContainer); 344 345 ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer); 346 // notify SchedulerNode of the update to correct resource accounting 347 node.containerUpdated(tempRMContainer, tempContainer); 348 {code} I think that it would be nicer to lock around these two calls to become atomic. {code} 431 public int getNumOpportunisticContainers() { 432 return numOpportunisticContainers; 321 } {code} This function takes a sample but does not lock. This is fine, however we need to make sure it reflects the state of the object, so for example allocateContainer() should set this value as the last step after the allocatedContainers.put() call. If containerResourceAllocated fails in guaranteedContainerResourceAllocated we will still call allocatedContainers.put(). I think this may cause some inconsistencies in the future. Probably it is better to propagate the false return code all the way to the caller. isValidGuaranteedContainer and isValidOpportunisticContainer contain the same code. Should they be different? Would an isValidContainer function be sufficient? {code} 294 Container container = rmContainer.getContainer(); 295 if (container.getExecutionType() == ExecutionType.GUARANTEED) { 296 guaranteedContainerResourceReleased(container); 297 allocatedContainers.remove(containerId); 298 numGuaranteedContainers--; 299 } else { 300 opportunisticContainerResourceReleased(container); 301 numOpportunisticContainers--; 302 allocatedContainers.remove(containerId); 303 } {code} allocatedContainers.remove(containerId); can be placed outside the if. containerResourceReleased should decrease resourceAllocatedPendingLaunch, if the container has not been started, yet. guaranteedContainerResourceReleased may fail inside but regardless of the outcome, we decrease numGuaranteedContainers. {{ + ", which has " + getNumGuaranteedContainers() + " containers, "}} should be {{ + ", which has " + getNumGuaranteedContainers() + " guaranteed containers, "}} I do not see unit tests added for getNumOpportunisticContainers() and opportunistic container code paths added in general. > Common scheduler changes supporting scheduler-specific implementations > -- > > Key: YARN-4511 > URL: https://issues.apache.org/jira/browse/YARN-4511 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Haibo Chen > Attachments: YARN-4511-YARN-1011.00.patch, > YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, > YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, > YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories
[ https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212845#comment-16212845 ] Eric Yang commented on YARN-7353: - Thank you [~ebadger]. The test passes on CentOS 7. +1 I just committed this. > Docker permitted volumes don't properly check for directories > - > > Key: YARN-7353 > URL: https://issues.apache.org/jira/browse/YARN-7353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-7353.001.patch, YARN-7353.002.patch, > YARN-7353.003.patch > > > {noformat:title=docker-util.c:check_mount_permitted()} > // directory check > permitted_mount_len = strlen(permitted_mounts[i]); > if (permitted_mount_len > 0 > && permitted_mounts[i][permitted_mount_len - 1] == '/') { > if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) > == 0) { > ret = 1; > break; > } > } > {noformat} > This code will treat "/home/" as a directory, but not "/home" > {noformat} > [ FAILED ] 3 tests, listed below: > [ FAILED ] TestDockerUtil.test_check_mount_permitted > [ FAILED ] TestDockerUtil.test_normalize_mounts > [ FAILED ] TestDockerUtil.test_add_rw_mounts > {noformat} > Additionally, YARN-6623 introduced new test failures in the C++ > container-executor test "cetest" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7360) TestRM.testNMTokenSentForNormalContainer() should be scheduler agnostic
[ https://issues.apache.org/jira/browse/YARN-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-7360: - Summary: TestRM.testNMTokenSentForNormalContainer() should be scheduler agnostic (was: TestRM.testNMTokenSentForNormalContainer() fails with Fair Scheduler) > TestRM.testNMTokenSentForNormalContainer() should be scheduler agnostic > --- > > Key: YARN-7360 > URL: https://issues.apache.org/jira/browse/YARN-7360 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: YARN-7360.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky
[ https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-7372: - Attachment: YARN-7372.01.patch Attach a new patch to address the check style indention issue. The TestDistributedScheduler failure is YARN-7299 > TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic > is flaky > > > Key: YARN-7372 > URL: https://issues.apache.org/jira/browse/YARN-7372 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Labels: unit-test > Attachments: YARN-7372.00.patch, YARN-7372.01.patch > > > testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container > to be running before it sends container update request. > The container update is handled asynchronously in node manager, and it does > not trigger visible state transition. If the node manager event > dispatch thread is slow, the unit test can fail at the the assertion > {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, > status.getExecutionType());{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation
[ https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212819#comment-16212819 ] Eric Badger commented on YARN-7361: --- +1 (non-binding) looks good to me > Improve the docker container runtime documentation > -- > > Key: YARN-7361 > URL: https://issues.apache.org/jira/browse/YARN-7361 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-7361.001.patch > > > During review of YARN-7230, it was found that > yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker > containers documentation in most of the active branches. We can also improve > the warning that was introduced in YARN-6622. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7261) Add debug message in class FSDownload for better download latency monitoring
[ https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212810#comment-16212810 ] Hadoop QA commented on YARN-7261: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 12s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7261 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893189/YARN-7261.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 75414fcbb83a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f4cdf1 | | Default Java | 1.8.0_131 | | unit |
[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories
[ https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212799#comment-16212799 ] Eric Badger commented on YARN-7353: --- Test failure is unrelated. [~eyang], [~vvasudev] could you review? > Docker permitted volumes don't properly check for directories > - > > Key: YARN-7353 > URL: https://issues.apache.org/jira/browse/YARN-7353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-7353.001.patch, YARN-7353.002.patch, > YARN-7353.003.patch > > > {noformat:title=docker-util.c:check_mount_permitted()} > // directory check > permitted_mount_len = strlen(permitted_mounts[i]); > if (permitted_mount_len > 0 > && permitted_mounts[i][permitted_mount_len - 1] == '/') { > if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) > == 0) { > ret = 1; > break; > } > } > {noformat} > This code will treat "/home/" as a directory, but not "/home" > {noformat} > [ FAILED ] 3 tests, listed below: > [ FAILED ] TestDockerUtil.test_check_mount_permitted > [ FAILED ] TestDockerUtil.test_normalize_mounts > [ FAILED ] TestDockerUtil.test_add_rw_mounts > {noformat} > Additionally, YARN-6623 introduced new test failures in the C++ > container-executor test "cetest" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation
[ https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212794#comment-16212794 ] Shane Kumpf commented on YARN-7361: --- The patch brings over the warning and missing property from YARN-7230. I believe we need this in trunk, branch-2, and branch-3.0. > Improve the docker container runtime documentation > -- > > Key: YARN-7361 > URL: https://issues.apache.org/jira/browse/YARN-7361 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-7361.001.patch > > > During review of YARN-7230, it was found that > yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker > containers documentation in most of the active branches. We can also improve > the warning that was introduced in YARN-6622. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories
[ https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212793#comment-16212793 ] Hadoop QA commented on YARN-7353: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 46s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7353 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893133/YARN-7353.003.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 9f1427ed9ff7 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f4cdf1 | | Default Java | 1.8.0_131 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/18052/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/18052/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18052/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Docker permitted volumes don't properly check for directories > - > > Key: YARN-7353 > URL: https://issues.apache.org/jira/browse/YARN-7353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: YARN-7353.001.patch, YARN-7353.002.patch, > YARN-7353.003.patch > > > {noformat:title=docker-util.c:check_mount_permitted()} > // directory check > permitted_mount_len = strlen(permitted_mounts[i]); >
[jira] [Updated] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT
[ https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-7102: - Attachment: YARN-7102-branch-2.v9.patch Thanks for porting the patches! I'm uploading the branch-2 patch again since Jenkins never commented on it. When both patches were attached at the same time it only commented on the 2.8 patch. Speaking of the 2.8 patch, it deleted the Overrides annotation on NodeInfo#pullNewlyIncreasedContainers which I assume was unintentional. Otherwise it looks good. I agree the test failures are unrelated. TestClientRMTokens and TestAMAuthorization are failing due to unknown host exceptions triggered by the docker environment, and the capacity scheduler preemption test is passing locally. > NM heartbeat stuck when responseId overflows MAX_INT > > > Key: YARN-7102 > URL: https://issues.apache.org/jira/browse/YARN-7102 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Critical > Attachments: YARN-7102-branch-2.8.v10.patch, > YARN-7102-branch-2.8.v9.patch, YARN-7102-branch-2.v9.patch, > YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v2.patch, > YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, > YARN-7102.v6.patch, YARN-7102.v7.patch, YARN-7102.v8.patch, YARN-7102.v9.patch > > > ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM > heartbeat in YARN-6640, please refer to YARN-6640 for details. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation
[ https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212788#comment-16212788 ] Hadoop QA commented on YARN-7361: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 29m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:ca8ddc6 | | JIRA Issue | YARN-7361 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12893223/YARN-7361.001.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 452de7033eaf 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1f4cdf1 | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/18055/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Improve the docker container runtime documentation > -- > > Key: YARN-7361 > URL: https://issues.apache.org/jira/browse/YARN-7361 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-7361.001.patch > > > During review of YARN-7230, it was found that > yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker > containers documentation in most of the active branches. We can also improve > the warning that was introduced in YARN-6622. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7117: --- Attachment: YARN-7117.poc.1.patch > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.1.patch, YARN-7117.poc.patch > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping
[ https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-7117: --- Attachment: (was: YARN-7117.poc.1.patch) > Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue > Mapping > -- > > Key: YARN-7117 > URL: https://issues.apache.org/jira/browse/YARN-7117 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: > YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, > YARN-7117.poc.patch > > > Currently Capacity Scheduler doesn't support auto creation of queues when > doing queue mapping. We saw more and more use cases which has complex queue > mapping policies configured to handle application to queues mapping. > The most common use case of CapacityScheduler queue mapping is to create one > queue for each user/group. However update {{capacity-scheduler.xml}} and > {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One > of the option to solve the problem is automatically create queues when new > user/group arrives. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)
[ https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212730#comment-16212730 ] Hadoop QA commented on YARN-7169: - (!) A patch to the testing environment has been detected. Re-executing against the patched versions to perform further tests. The console is at https://builds.apache.org/job/PreCommit-YARN-Build/18056/console in case of problems. > Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2) > -- > > Key: YARN-7169 > URL: https://issues.apache.org/jira/browse/YARN-7169 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Reporter: Vrushali C >Assignee: Vrushali C >Priority: Critical > Attachments: FlowRunDetails_Sleepjob.png, Metrics_Yarn_UI.png, > YARN-7169-YARN-3368_branch2.0001.patch, > YARN-7169-YARN-5355_branch2.0001.patch, > YARN-7169-YARN-5355_branch2.0002.patch, > YARN-7169-YARN-5355_branch2.0003.patch, > YARN-7169-YARN-5355_branch2.0004.patch, > YARN-7169-YARN-5355_branch2.0004.patch, YARN-7169-branch-2.0001.patch, > YARN-7169-branch-2.0002.patch, ui_commits(1) > > > Jira to track the backport of the new yarn-ui onto branch2. Right now adding > into Timeline Service v2's branch2 which is YARN-5355_branch2. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org