[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Environment: 
RM recovery and NM recovery enabled;
Spark streaming application, a long-running application on yarn

  was:
Hadoop 2.7.1 RM recovery and NM recovery enabled;
Spark streaming application, a long-running application on yarn


> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because of NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE';
> *A bug here*: nm1 will list this container c1 in webui forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> *A bug here*:
> Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because of NM lost, so 
app1's AM killed that container through RPC and then request a new container 
named c2 from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: nm1 will list this container c1 in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*:
Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*:
Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because of NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE';
> *A bug here*: nm1 will list this container c1 in webui forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> *A bug here*:
> Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*
Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*
Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE';
> *A bug here*: this NM will list this container in webui forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> *A bug here*
> Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*:
Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*
Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE';
> *A bug here*: this NM will list this container in webui forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> *A bug here*:
> Now, app1 has *4 containers* in total, while *c2 and c3 were the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE';
*A bug here*: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

*A bug here*
Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE';
> *A bug here*: this NM will list this container in webui forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> *A bug here*
> Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 

Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 
Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE'; A bug here: this NM will list this container in webui 
> forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 

2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;

3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;

4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. 
Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, 
while c2 and c3 was the same.



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE'; A bug here: this NM will list this container in webui 
> forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. 
> Now, app1 has *4 containers* in total, while *c2 and c3 was the same*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time; app1 
has *3 containers* in total, one of them named c1 runs on a NM named nm1;

1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM, which is duplicate to c1;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM; However, when 
nm1 registers to RM, RM found the container c1's status was DONE, so RM tells 
app1's AM that a container of app1 was complete, since spark streaming 
application has fixed number of containers, so AM request a new container named 
c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in total, 
while c2 and c3 was the same.


  was:
Case:
A Spark streaming application named app1 running on yarn for a long time, app1 
has a container named c1 on a NM named nm1;
1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM;






> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time; 
> app1 has *3 containers* in total, one of them named c1 runs on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM, which is duplicate to c1;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE'; A bug here: this NM will list this container in webui 
> forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM; However, 
> when nm1 registers to RM, RM found the container c1's status was DONE, so RM 
> tells app1's AM that a container of app1 was complete, since spark streaming 
> application has fixed number of containers, so AM request a new container 
> named c3 from RM, which is duplicate to c1. Now, app1 has *4 containers* in 
> total, while c2 and c3 was the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7339) LocalityMulticastAMRMProxyPolicy should handle cancel request properly

2017-10-20 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7339:
---
Attachment: YARN-7339-v6.patch

retry as v6 patch...

> LocalityMulticastAMRMProxyPolicy should handle cancel request properly
> --
>
> Key: YARN-7339
> URL: https://issues.apache.org/jira/browse/YARN-7339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-7339-v1.patch, YARN-7339-v2.patch, 
> YARN-7339-v3.patch, YARN-7339-v4.patch, YARN-7339-v5.patch, YARN-7339-v6.patch
>
>
> Currently inside AMRMProxy, LocalityMulticastAMRMProxyPolicy is not handling 
> and splitting cancel requests from AM properly: 
> # For node cancel request, we should not treat it as a localized resource 
> request. Otherwise it can lead to all weight zero issue when computing 
> localized resource weight. 
> # For ANY cancel, we should broadcast to all known subclusters, not just the 
> ones associated with localized resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2017-10-20 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213737#comment-16213737
 ] 

Botong Huang commented on YARN-7102:


Thanks [~jlowe] for the double check! When I did the cherry-pick for branch-2 
it doesn't have any conflict. I think it is the auto merge that messed the 
annotation up. Somehow the Jenkins still didn't run for branch-2 though...

> NM heartbeat stuck when responseId overflows MAX_INT
> 
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Critical
> Attachments: YARN-7102-branch-2.8.v10.patch, 
> YARN-7102-branch-2.8.v9.patch, YARN-7102-branch-2.v9.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v2.patch, 
> YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, 
> YARN-7102.v6.patch, YARN-7102.v7.patch, YARN-7102.v8.patch, YARN-7102.v9.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time, app1 
has a container named c1 on a NM named nm1;
1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, RM restore all 
the apps including app1, and all the NM need re-register to RM;





  was:
Case:
A Spark streaming application named app1 running on yarn for a long time, app1 
has a container named c1 on a NM named nm1;
1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, 






> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time, 
> app1 has a container named c1 on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE'; A bug here: this NM will list this container in webui 
> forever;
> 4. RM restart for some reason; since RM's recovery was enabled, RM restore 
> all the apps including app1, and all the NM need re-register to RM;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:
A Spark streaming application named app1 running on yarn for a long time, app1 
has a container named c1 on a NM named nm1;
1. The NM named nm1 was lost for some reason, but the containers on it runs 
well; 
2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
tells app1's AM that a container of app1 was failed because NM lost, so app1's 
AM killed that container through RPC and then request a new container named c2 
from RM;
3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
enabled, NM restore all the containers including container c1, but now c1's 
status is 'DONE'; A bug here: this NM will list this container in webui forever;
4. RM restart for some reason; since RM's recovery was enabled, 





  was:
Case:



> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:
> A Spark streaming application named app1 running on yarn for a long time, 
> app1 has a container named c1 on a NM named nm1;
> 1. The NM named nm1 was lost for some reason, but the containers on it runs 
> well; 
> 2. 10 minutes later, RM lost this NM because of no heartbeats received; so RM 
> tells app1's AM that a container of app1 was failed because NM lost, so 
> app1's AM killed that container through RPC and then request a new container 
> named c2 from RM;
> 3. Administrator found nm1 lost, so he restart it; since NM's recovery was 
> enabled, NM restore all the containers including container c1, but now c1's 
> status is 'DONE'; A bug here: this NM will list this container in webui 
> forever;
> 4. RM restart for some reason; since RM's recovery was enabled, 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-7377:
--
Description: 
Case:


> Duplicate Containers allocated for Long-Running Application after NM lost and 
> restart and RM restart
> 
>
> Key: YARN-7377
> URL: https://issues.apache.org/jira/browse/YARN-7377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications, nodemanager, RM, yarn
>Affects Versions: 3.0.0-alpha3
> Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
> Spark streaming application, a long-running application on yarn
>Reporter: rangjiaheng
>  Labels: patch
>
> Case:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7377) Duplicate Containers allocated for Long-Running Application after NM lost and restart and RM restart

2017-10-20 Thread rangjiaheng (JIRA)
rangjiaheng created YARN-7377:
-

 Summary: Duplicate Containers allocated for Long-Running 
Application after NM lost and restart and RM restart
 Key: YARN-7377
 URL: https://issues.apache.org/jira/browse/YARN-7377
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager, RM, yarn
Affects Versions: 3.0.0-alpha3
 Environment: Hadoop 2.7.1 RM recovery and NM recovery enabled;
Spark streaming application, a long-running application on yarn
Reporter: rangjiaheng






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213695#comment-16213695
 ] 

Hadoop QA commented on YARN-7276:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
33s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7276 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893391/YARN-7276.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f2ff7417a4fa 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 248d9b6 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18068/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18068/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: 

[jira] [Commented] (YARN-7376) YARN top ACLs

2017-10-20 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213694#comment-16213694
 ] 

Jonathan Hung commented on YARN-7376:
-

Fix unit test in 002. Also fix compatibility by setting default ACL to *.

> YARN top ACLs
> -
>
> Key: YARN-7376
> URL: https://issues.apache.org/jira/browse/YARN-7376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7376.001.patch, YARN-7376.002.patch
>
>
> Currently YARN top can be invoked by everyone. But we want to avoid a 
> scenario where random users invoke YARN top, and potentially leave it 
> running. So we can implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7376) YARN top ACLs

2017-10-20 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7376:

Attachment: YARN-7376.002.patch

> YARN top ACLs
> -
>
> Key: YARN-7376
> URL: https://issues.apache.org/jira/browse/YARN-7376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7376.001.patch, YARN-7376.002.patch
>
>
> Currently YARN top can be invoked by everyone. But we want to avoid a 
> scenario where random users invoke YARN top, and potentially leave it 
> running. So we can implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7376) YARN top ACLs

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213662#comment-16213662
 ] 

Hadoop QA commented on YARN-7376:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 43s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
17s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7376 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893367/YARN-7376.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 58743e6f5336 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 248d9b6 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18066/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt
 |
| 

[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.005.patch

Thanks [~subru] for the comments.
I added the multithreaded test in 005.
The rest were already done in 004.


> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch, YARN-7276.003.patch, YARN-7276.004.patch, 
> YARN-7276.005.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213645#comment-16213645
 ] 

Hadoop QA commented on YARN-7276:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
1s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7276 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893380/YARN-7276.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 06a342439c75 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 248d9b6 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18067/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18067/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: 

[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213604#comment-16213604
 ] 

Hadoop QA commented on YARN-7276:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  0s{color} 
| {color:red} hadoop-yarn-server-router in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.router.webapp.TestRouterWebServicesREST |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7276 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893359/YARN-7276.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 667caa416fc5 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 248d9b6 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18065/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18065/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18065/console |
| Powered by | Apache Yetus 

[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.004.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch, YARN-7276.003.patch, YARN-7276.004.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7318) Fix shell check warnings of SLS.

2017-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213574#comment-16213574
 ] 

Hudson commented on YARN-7318:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13120 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13120/])
YARN-7318. Fix shell check warnings of SLS. (Gergely Novák via wangda) (wangda: 
rev 281d83604df8341c210cee39bdc745ca793c5afa)
* (edit) hadoop-tools/hadoop-sls/src/main/bin/rumen2sls.sh
* (edit) hadoop-tools/hadoop-sls/src/main/bin/slsrun.sh


> Fix shell check warnings of SLS.
> 
>
> Key: YARN-7318
> URL: https://issues.apache.org/jira/browse/YARN-7318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Gergely Novák
> Fix For: 3.0.0
>
> Attachments: YARN-7318.001.patch
>
>
> Warnings like: 
> {code}
> hadoop-tools/hadoop-sls/src/main/bin/rumen2sls.sh:75:77: warning: args is 
> referenced but not assigned. [SC2154]
> hadoop-tools/hadoop-sls/src/main/bin/slsrun.sh:113:61: warning: args is 
> referenced but not assigned. [SC2154]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-7351) High CPU usage issue in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7351:

Comment: was deleted

(was: -1 after applying patch 003, query started failing when it is used in 
combination with patch for YARN-7326.

{code}
[yarn@eyang-1 hadoop-3.1.0-SNAPSHOT]$ dig @localhost -p 5353 .
;; Warning: query response not set

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @localhost -p 5353 .
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 48353
;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 9 msec
;; SERVER: 127.0.0.1#5353(127.0.0.1)
;; WHEN: Fri Oct 20 19:49:49 UTC 2017
;; MSG SIZE  rcvd: 12
{code}

This is because the response payload is bigger than UDP datagram.  TCP channel 
for response is working using the initialized NIOTCPChannel.)

> High CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213538#comment-16213538
 ] 

Eric Yang commented on YARN-7217:
-

The findbugs warning is not introduced by this JIRA.  [~billie.rinaldi] 
[~jianhe] . Would you mind to take another pass on patch 5?  Thank you

> Improve API service usability for updating service spec and state
> -
>
> Key: YARN-7217
> URL: https://issues.apache.org/jira/browse/YARN-7217
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7217.yarn-native-services.001.patch, 
> YARN-7217.yarn-native-services.002.patch, 
> YARN-7217.yarn-native-services.003.patch, 
> YARN-7217.yarn-native-services.004.patch, 
> YARN-7217.yarn-native-services.005.patch
>
>
> API service for deploy, and manage YARN services have several limitations.
> {{updateService}} API provides multiple functions:
> # Stopping a service.
> # Start a service.
> # Increase or decrease number of containers.  (This was removed in YARN-7323).
> The overloading is buggy depending on how the configuration should be applied.
> h4. Scenario 1
> A user retrieves Service object from getService call, and the Service object 
> contains state: STARTED.  The user would like to increase number of 
> containers for the deployed service.  The JSON has been updated to increase 
> container count.  The PUT method does not actually increase container count.
> h4. Scenario 2
> A user retrieves Service object from getService call, and the Service object 
> contains state: STOPPED.  The user would like to make a environment 
> configuration change.  The configuration does not get updated after PUT 
> method.
> This is possible to address by rearranging the logic of START/STOP after 
> configuration update.  However, there are other potential combinations that 
> can break PUT method.  For example, user like to make configuration changes, 
> but not yet restart the service until a later time.
> h4. Scenario 3
> There is no API to list all deployed applications by the same user.
> h4. Scenario 4
> Desired state (spec) and current state are represented by the same Service 
> object.  There is no easy way to identify "state" is desired state to reach 
> or, the current state of the service.  It would be nice to have ability to 
> retrieve both desired state, and current state with separated entry points.  
> By implementing /spec and /state, it can resolve this problem.
> h4. Scenario 5
> List all services deploy by the same user can trigger a directory listing 
> operation on namenode if hdfs is used as storage for metadata.  When hundred 
> of users use Service UI to view or deploy applications, this will trigger 
> denial of services attack on namenode.  The sparse small metadata files also 
> reduce efficiency of Namenode memory usage.  Hence, a cache layer for storing 
> service metadata can reduce namenode stress.
> h3. Proposed change
> ApiService can separate the PUT method into two PUT methods for configuration 
> changes vs operation changes.  New API could look like:
> {code}
> @PUT
> /ws/v1/services/[service_name]/spec
> Request Data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "number_of_containers": 2,
>   "artifact": {
> "id": "centos/mysql-57-centos7:latest",
> "type": "DOCKER"
>   },
>   "run_privileged_container": false,
>   "launch_command": "",
>   "resource": {
> "cpus": 1,
> "memory": "2048"
>   },
>   "configuration": {
> "env": {
>   "MYSQL_USER":"${USER}",
>   "MYSQL_PASSWORD":"password"
> }
>   }
>  }
>   ],
>   "quicklinks": {
> "Apache Document Root": 
> "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;,
> "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;
>   }
> }
> {code}
> {code}
> @PUT
> /ws/v1/services/[service_name]/state
> Request data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "state": "STOPPED"
>  }
>   ]
> }
> {code}
> SOLR can be used to cache Yarnfile to improve lookup performance and reduce 
> stress of namenode small file problems and high frequency lookup.  SOLR is 
> chosen for caching metadata because its indexing feature can be used to build 
> full text search for application catalog as well.
> For service that requires configuration changes to increase or decrease node 
> count.  The calling sequence is:
> {code}
> # GET /ws/v1/services/{service_name}/spec
> # Change number_of_containers to desired number.
> # PUT /ws/v1/services/{service_name}/spec to update the spec.
> # PUT /ws/v1/services/{service_name}/state to stop existing 

[jira] [Commented] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213535#comment-16213535
 ] 

Hadoop QA commented on YARN-7326:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
15s{color} | {color:red} Docker failed to build yetus/hadoop:0de40f0. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7326 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893368/YARN-7326.yarn-native-services.003.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18064/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7326:

Attachment: (was: YARN-7326.yarn-native-services.003.patch)

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7326:

Attachment: YARN-7326.yarn-native-services.003.patch

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7376) YARN top ACLs

2017-10-20 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213529#comment-16213529
 ] 

Jonathan Hung commented on YARN-7376:
-

Attached 001 patch which adds {{yarn.top.acl}} for ACLs on client side.

[~vvasudev], can you take a look? Thanks!

> YARN top ACLs
> -
>
> Key: YARN-7376
> URL: https://issues.apache.org/jira/browse/YARN-7376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7376.001.patch
>
>
> Currently YARN top can be invoked by everyone. But we want to avoid a 
> scenario where random users invoke YARN top, and potentially leave it 
> running. So we can implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7376) YARN top ACLs

2017-10-20 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7376:

Attachment: YARN-7376.001.patch

> YARN top ACLs
> -
>
> Key: YARN-7376
> URL: https://issues.apache.org/jira/browse/YARN-7376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7376.001.patch
>
>
> Currently YARN top can be invoked by everyone. But we want to avoid a 
> scenario where random users invoke YARN top, and potentially leave it 
> running. So we can implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7376) YARN top ACLs

2017-10-20 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-7376:
---

 Summary: YARN top ACLs
 Key: YARN-7376
 URL: https://issues.apache.org/jira/browse/YARN-7376
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


Currently YARN top can be invoked by everyone. But we want to avoid a scenario 
where random users invoke YARN top, and potentially leave it running. So we can 
implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails

2017-10-20 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7375:

Description: 
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
at 
org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58)

Steps:
1. RM HA is enabled
2. Started a service 
3. Active RM failed. 
4. Switched to the Web UI of Standby RM 
5. Clicked to view the containers of the previous started application and 
landed to an error page.
6. The NPE mentioned above was found in the standby RM logs

  was:
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
at 
org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58)


> NPE in the RM Webapp when HA is enabled and the active RM fails
> ---
>
> Key: YARN-7375
> URL: https://issues.apache.org/jira/browse/YARN-7375
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
> at 
> org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> 

[jira] [Created] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails

2017-10-20 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-7375:
---

 Summary: NPE in the RM Webapp when HA is enabled and the active RM 
fails
 Key: YARN-7375
 URL: https://issues.apache.org/jira/browse/YARN-7375
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh


Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
at 
org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved YARN-6142.
--
   Resolution: Information Provided
Fix Version/s: 3.0.0

Protobuf and JACC analysis done.  Will continue rolling upgrade reviews at 
HDFS-11096.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.003.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch, YARN-7276.003.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213458#comment-16213458
 ] 

Ray Chiang commented on YARN-6142:
--

Minor issues found by JACC.

YARN-2696
- CapacityScheduler#getQueueComparator() split into partitioned/nonparititoned 
comparator

YARN-3139
- Removed synchronized from CapacityScheduler#getContainerTokenSecretManager()
- Removed synchronized from CapacityScheduler#getRMContext()
- Removed synchronized from CapacityScheduler#setRMContext()

YARN-3413
- YarnClient#getClusterNodeLabels() changed return type

YARN-3866
- Major refactor in Public APIs for AM-RM for handling container resizing.
- Change went into both 2.8.0 and 3.0.0.

YARN-3873
- CapacityScheduler#getApplicationComparator() removed

YARN-4593
- AbstractService#getConfig() removed synchronized

YARN-5077
- Removed SchedulingPolicy#checkIfAMResourceUsageOverLimit()

YARN-5221
- AllocateRequest / AllocateResponse has methods changed from Public/Stable to 
Public/Unstable

YARN-5713
- Update jackson affects TimelineUtils#dumpTimelineRecordtoJSON()


> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping

2017-10-20 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213448#comment-16213448
 ] 

Suma Shivaprasad commented on YARN-7117:


Attached a doc depicting the workflow and classes or Auto queue creation and 
Capacity Management for these queues

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> --
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, 
> YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf
>
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping

2017-10-20 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7117:
---
Attachment: YARN-7117_Workflow.pdf

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> --
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, 
> YARN-7117.poc.1.patch, YARN-7117.poc.patch, YARN-7117_Workflow.pdf
>
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213442#comment-16213442
 ] 

Hadoop QA commented on YARN-7217:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 34 new or modified test 
files. {color} |
|| || || || {color:brown} yarn-native-services Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
58s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
13s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
22s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
41s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
18s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
yarn-native-services has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green} yarn-native-services passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 40s{color} | {color:orange} root: The patch generated 10 new + 247 unchanged 
- 9 fixed = 257 total (was 256) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
14s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
50s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
12s{color} | {color:green} hadoop-yarn-services-api 

[jira] [Updated] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7326:

Attachment: YARN-7326.yarn-native-services.003.patch

Fix error code handling.  Some error code was not handled correctly for non 
existed domain and unauthorized domain.

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213332#comment-16213332
 ] 

Hadoop QA commented on YARN-7374:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService 
|
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7374 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893323/YARN-7374.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 26172b0bc3eb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky

2017-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213319#comment-16213319
 ] 

Hudson commented on YARN-7372:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13119 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13119/])
YARN-7372. (haibochen: rev 480187aebbc13547af06684820a416d22e7c4649)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerQueuing.java


> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
>  is flaky 
> 
>
> Key: YARN-7372
> URL: https://issues.apache.org/jira/browse/YARN-7372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>  Labels: unit-test
> Attachments: YARN-7372.00.patch, YARN-7372.01.patch
>
>
> testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container 
> to be running before it sends container update request.
> The container update is handled asynchronously in node manager, and it does 
> not trigger visible state transition. If the node manager event
> dispatch thread is slow, the unit test can fail at the the assertion 
> {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, 
> status.getExecutionType());{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213280#comment-16213280
 ] 

Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM:
-

[~haibochen] / [~miklos.szeg...@cloudera.com]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is 
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} 
method in the SchedulerApplicationAttempt - during which the thread has 
acquired a write lock on the application. You don't need a lock on the queue 
and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the 
Container is running might have heart-beaten in - but that operation, 
releaseContainer, tries to take a lock on the app too, which will have to 
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we 
are good there
# It is possible that multiple container update requests (say container 
increase requests) for containers running on the same node can come in 
concurrently - but the flow is such that the actual resource allocation for the 
update is internally treated as a new (temporary) container container 
allocation - and just like any normal container allocation in the scheduler, 
they are serialized.
# It is possible that multiple container requests for the SAME container can 
come in too - but we have a container version that takes care of that.

Although - I do have to mention, that the code you pasted above - which is part 
of the changes in YARN-4511 can cause a few problems, since you are updating 
the node as well, and you might need a lock on the node before you do that.


was (Author: asuresh):
[~haibochen] / [~miklos.szeg...@cloudera.com]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is 
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} 
method in the SchedulerApplicationAttempt - during which the thread has 
acquired a write lock on the application. You don't need a lock on the queue 
and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the 
Container is running might have heart-beaten in - but that operation, 
releaseContainer, tries to take a lock on the app too, which will have to 
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we 
are good there
# It is possible that multiple container update requests (say container 
increase requests) for containers running on the same node can come in 
concurrently - but the flow is such that the actual resource allocation for the 
update is internally treated as a new (temporary) container container 
allocation - and just like any normal container allocation in the scheduler, 
they are serialized.
# It is possible that multiple container requests for the SAME container can 
come in too - but we have a container version that takes care of that.

> The atomicity of container update in RM is not clear
> 
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342   // notify schedulerNode of the update to correct resource accounting
> 343   node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345   
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346   // notify SchedulerNode of the update to correct resource accounting
> 347   node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213280#comment-16213280
 ] 

Arun Suresh commented on YARN-7373:
---

[~haibochen] / [~miklos.szeg...@cloudera.com]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is 
mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} 
method in the SchedulerApplicationAttempt - during which the thread has 
acquired a write lock on the application. You don't need a lock on the queue 
and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the 
Container is running might have heart-beaten in - but that operation, 
releaseContainer, tries to take a lock on the app too, which will have to 
contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we 
are good there
# It is possible that multiple container update requests (say container 
increase requests) for containers running on the same node can come in 
concurrently - but the flow is such that the actual resource allocation for the 
update is internally treated as a new (temporary) container container 
allocation - and just like any normal container allocation in the scheduler, 
they are serialized.
# It is possible that multiple container requests for the SAME container can 
come in too - but we have a container version that takes care of that.

> The atomicity of container update in RM is not clear
> 
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342   // notify schedulerNode of the update to correct resource accounting
> 343   node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345   
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346   // notify SchedulerNode of the update to correct resource accounting
> 347   node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213265#comment-16213265
 ] 

Haibo Chen commented on YARN-7372:
--

Thanks [~asuresh] for the review! Will check it in shortly.

> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
>  is flaky 
> 
>
> Key: YARN-7372
> URL: https://issues.apache.org/jira/browse/YARN-7372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>  Labels: unit-test
> Attachments: YARN-7372.00.patch, YARN-7372.01.patch
>
>
> testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container 
> to be running before it sends container update request.
> The container update is handled asynchronously in node manager, and it does 
> not trigger visible state transition. If the node manager event
> dispatch thread is slow, the unit test can fail at the the assertion 
> {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, 
> status.getExecutionType());{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler

2017-10-20 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213254#comment-16213254
 ] 

Yufei Gu commented on YARN-7374:


Thanks for working on this [~templedf]. The patch looks good to me generally. 
Some nits:
- Would you like to publish the performance comparison result?
- remove "* @param n the number of resource types" for method {{compare2()}} 
- Two empty lines before "// A queue is needy for its min share if its dominant 
resource".
- Code would be cleaner if putting method {{compare2}} and its support methods 
to a separated class.
- Maybe a good idea to add comment to indicate how to get non-dominate index, 
or a new method like {{getNonDominateIndex(int dominant) { return 1 - 
dominant}}}.
- Could these code be put into separated method? Since it is invoked several 
times. 
{code}
  if (res == 0) {
// Apps are tied in fairness ratio. Break the tie by submit time and job
// name to get a deterministic ordering, which is useful for unit tests.
res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());

if (res == 0) {
  res = s1.getName().compareTo(s2.getName());
}
  }
{code}


> Improve performance of DRF comparisons for resource types in fair scheduler
> ---
>
> Key: YARN-7374
> URL: https://issues.apache.org/jira/browse/YARN-7374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-7374.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213251#comment-16213251
 ] 

Ray Chiang commented on YARN-6142:
--

I'm done with the JACC analysis, but need to do the same type of writeup that 
was done for protobuf.

The quick answer is that we don't have any major red flags, but I'm going to 
note some potential incompatibilities that are very minor, but could affect 
some random API user out there.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7170) Improve bower dependencies for YARN UI v2

2017-10-20 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-7170:
-
Fix Version/s: 2.9.0

> Improve bower dependencies for YARN UI v2
> -
>
> Key: YARN-7170
> URL: https://issues.apache.org/jira/browse/YARN-7170
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Fix For: 2.9.0, 3.0.0
>
> Attachments: YARN-7170.001.patch, YARN-7170.002.patch
>
>
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  50% (38449/75444), 722.46 MiB | 3.30 MiB/s
> ...
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  99% (75017/75444), 1.56 GiB | 3.31 MiB/s
> Investigate the dependencies and reduce the download size and speed of 
> compilation.
> cc/ [~Sreenath] and [~akhilpb]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7338) Support same origin policy for cross site scripting prevention.

2017-10-20 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-7338:
-
Fix Version/s: 2.9.0

> Support same origin policy for cross site scripting prevention.
> ---
>
> Key: YARN-7338
> URL: https://issues.apache.org/jira/browse/YARN-7338
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Reporter: Vrushali C
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7338.001.patch
>
>
> Opening jira as suggested b [~eyang] on the thread for merging YARN-3368 (new 
> web UI) to branch2  
> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201610.mbox/%3ccad++ecmvvqnzqz9ynkvkcxaczdkg50yiofxktgk3mmms9sh...@mail.gmail.com%3E
> --
> Ui2 does not seem to support same origin policy for cross site scripting 
> prevention.
> The following parameters has no effect for /ui2:
> hadoop.http.cross-origin.enabled = true
> yarn.resourcemanager.webapp.cross-origin.enabled = true
> This is because ui2 is designed as a separate web application.  WebFilters 
> setup for existing resource manager doesn’t apply to the new web application.
> Please open JIRA to track the security issue and resolve the problem prior to 
> backporting this to branch-2.
> This would minimize the risk to open up security hole in branch-2.
> --



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky

2017-10-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213215#comment-16213215
 ] 

Arun Suresh commented on YARN-7372:
---

+1, Thanks [~haibochen]

> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
>  is flaky 
> 
>
> Key: YARN-7372
> URL: https://issues.apache.org/jira/browse/YARN-7372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>  Labels: unit-test
> Attachments: YARN-7372.00.patch, YARN-7372.01.patch
>
>
> testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container 
> to be running before it sends container update request.
> The container update is handled asynchronously in node manager, and it does 
> not trigger visible state transition. If the node manager event
> dispatch thread is slow, the unit test can fail at the the assertion 
> {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, 
> status.getExecutionType());{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213197#comment-16213197
 ] 

Subru Krishnan commented on YARN-7276:
--

Thanks [~elgoiri] for the fixes. I looked at it and is mostly good, minor 
comments below:
* DefaultMetricsSystem initialization seems to be missing in the patch.
* Add tests to check empty states and labels?
* Would it be possible to have a multi-threaded test?
* Nit: {{FederationInterceptorREST::getCopy}} --> 
{{FederationInterceptorREST::Clone}} and mention in the comment that this is 
for thread safeness. 

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213194#comment-16213194
 ] 

Hadoop QA commented on YARN-7372:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 47s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 15s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893295/YARN-7372.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 118a676353a1 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6b7c87c |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18058/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18058/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18058/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Assigned] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-7276:


Assignee: Íñigo Goiri  (was: Giovanni Matteo Fumarola)

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213177#comment-16213177
 ] 

Andrew Wang commented on YARN-6142:
---

Hi Ray, what is left to do here? Is it tracking towards completion by the end 
of the month?

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213173#comment-16213173
 ] 

Hadoop QA commented on YARN-7276:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 55s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
0s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7276 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893303/YARN-7276.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 67718d83eb09 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6b7c87c |
| Default Java | 1.8.0_131 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18059/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18059/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Giovanni Matteo Fumarola
> 

[jira] [Commented] (YARN-7178) Add documentation for Container Update API

2017-10-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213164#comment-16213164
 ] 

Andrew Wang commented on YARN-7178:
---

Ping, is this one tracking towards completion by the end of the month? It's 
marked as a blocker.

> Add documentation for Container Update API
> --
>
> Key: YARN-7178
> URL: https://issues.apache.org/jira/browse/YARN-7178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213160#comment-16213160
 ] 

Haibo Chen commented on YARN-7355:
--

Thanks @Yufei for the review!

> TestDistributedShell should be scheduler agnostic 
> --
>
> Key: YARN-7355
> URL: https://issues.apache.org/jira/browse/YARN-7355
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7355.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7351) High CPU usage issue in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212937#comment-16212937
 ] 

Eric Yang edited comment on YARN-7351 at 10/20/17 7:55 PM:
---

-1 after applying patch 003, query started failing when it is used in 
combination with patch for YARN-7326.

{code}
[yarn@eyang-1 hadoop-3.1.0-SNAPSHOT]$ dig @localhost -p 5353 .
;; Warning: query response not set

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> @localhost -p 5353 .
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 48353
;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 9 msec
;; SERVER: 127.0.0.1#5353(127.0.0.1)
;; WHEN: Fri Oct 20 19:49:49 UTC 2017
;; MSG SIZE  rcvd: 12
{code}

This is because the response payload is bigger than UDP datagram.  TCP channel 
for response is working using the initialized NIOTCPChannel.


was (Author: eyang):
+1 for disabling TCP channel for now.

> High CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic

2017-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213119#comment-16213119
 ] 

Hudson commented on YARN-7355:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13118/])
YARN-7355. TestDistributedShell should be scheduler agnostic. (yufei: rev 
6b7c87c94592606966a4229313b3d0da48f16158)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


> TestDistributedShell should be scheduler agnostic 
> --
>
> Key: YARN-7355
> URL: https://issues.apache.org/jira/browse/YARN-7355
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7355.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories

2017-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213117#comment-16213117
 ] 

Hudson commented on YARN-7353:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13118/])
YARN-7353. Improved volume mount check for directories and unit test (eyang: 
rev b61144a93d9306624378a93944d0a08c60436554)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc


> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
> if (permitted_mount_len > 0
> && permitted_mounts[i][permitted_mount_len - 1] == '/') {
>   if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) 
> == 0) {
> ret = 1;
> break;
>   }
> }
> {noformat}
> This code will treat "/home/" as a directory, but not "/home"
> {noformat}
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] TestDockerUtil.test_check_mount_permitted
> [  FAILED  ] TestDockerUtil.test_normalize_mounts
> [  FAILED  ] TestDockerUtil.test_add_rw_mounts
> {noformat}
> Additionally, YARN-6623 introduced new test failures in the C++ 
> container-executor test "cetest"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7261) Add debug message for better download latency monitoring

2017-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213118#comment-16213118
 ] 

Hudson commented on YARN-7261:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13118 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13118/])
YARN-7261. Add debug message for better download latency monitoring. (yufei: 
rev 0799fde35e7f3b9e8a85284ac0b30f6bdcbffad1)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


> Add debug message for better download latency monitoring
> 
>
> Key: YARN-7261
> URL: https://issues.apache.org/jira/browse/YARN-7261
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7261.001.patch, YARN-7261.002.patch, 
> YARN-7261.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler

2017-10-20 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7374:
---
Attachment: YARN-7374.001.patch

> Improve performance of DRF comparisons for resource types in fair scheduler
> ---
>
> Key: YARN-7374
> URL: https://issues.apache.org/jira/browse/YARN-7374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-7374.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7374) Improve performance of DRF comparisons for resource types in fair scheduler

2017-10-20 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7374:
--

 Summary: Improve performance of DRF comparisons for resource types 
in fair scheduler
 Key: YARN-7374
 URL: https://issues.apache.org/jira/browse/YARN-7374
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.1.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213028#comment-16213028
 ] 

Haibo Chen commented on YARN-4511:
--

YARN-7373 is created for the container update atomicity discussion.

> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213026#comment-16213026
 ] 

Haibo Chen commented on YARN-7373:
--

[~asuresh] Can you please provide some background and details of container 
update?
The atomicity is not clear to us in term of how it is guaranteed. Our concern 
is that
another container allocation may come in between the two containerUpdated() call
and there is not enough resource available for the allocation.

> The atomicity of container update in RM is not clear
> 
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342   // notify schedulerNode of the update to correct resource accounting
> 343   node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345   
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346   // notify SchedulerNode of the update to correct resource accounting
> 347   node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7373:
-
Description: 
While reviewing YARN-4511, Miklos noticed that  
{code:java}
342 // notify schedulerNode of the update to correct resource accounting
343 node.containerUpdated(existingRMContainer, existingContainer);
344 
345 
((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
346 // notify SchedulerNode of the update to correct resource accounting
347 node.containerUpdated(tempRMContainer, tempContainer);
348 
{code}
bq. I think that it would be nicer to lock around these two calls to become 
atomic.

Container update, and thus container swap as part of that, is atomic according 
to [~asuresh].
It'd be nice to discuss this in more details to see if we want to be more 
conservative.

  was:
While reviewing YARN-4511, Miklos pointed out that  
{code:java}
342 // notify schedulerNode of the update to correct resource accounting
343 node.containerUpdated(existingRMContainer, existingContainer);
344 
345 
((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
346 // notify SchedulerNode of the update to correct resource accounting
347 node.containerUpdated(tempRMContainer, tempContainer);
348 
{code}
bq. I think that it would be nicer to lock around these two calls to become 
atomic.


> The atomicity of container update in RM is not clear
> 
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342   // notify schedulerNode of the update to correct resource accounting
> 343   node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345   
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346   // notify SchedulerNode of the update to correct resource accounting
> 347   node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7370) Intra-queue preemption properties should be refreshable

2017-10-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213017#comment-16213017
 ] 

Wangda Tan commented on YARN-7370:
--

[~GergelyNovak], you're very much welcome to take up this task, this is quite 
helpful and important for user to use preemption.

I agree with Eric, it's better to include this in {{-refreshQueues}} op so we 
don't need any changes to RMAdmin protocol and CLI. To me the requirement is:

- Handle changes to {{SchedulingEditPolicy}} configs including preemption 
(which means {{SchedulingMonitor}} should be refreshable as well).
- All preemption-related parameters.

[~eepayne]/[~sunilg], please feel free to add any requirement in your mind.

> Intra-queue preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7373:


 Summary: The atomicity of container update in RM is not clear
 Key: YARN-7373
 URL: https://issues.apache.org/jira/browse/YARN-7373
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


While reviewing YARN-4511, Miklos pointed out that  
{code:java}
342 // notify schedulerNode of the update to correct resource accounting
343 node.containerUpdated(existingRMContainer, existingContainer);
344 
345 
((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
346 // notify SchedulerNode of the update to correct resource accounting
347 node.containerUpdated(tempRMContainer, tempContainer);
348 
{code}
bq. I think that it would be nicer to lock around these two calls to become 
atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213013#comment-16213013
 ] 

Haibo Chen commented on YARN-4511:
--

bq. If containerResourceAllocated fails in guaranteedContainerResourceAllocated 
we will still call allocatedContainers.put(). I think this may cause some 
inconsistencies in the future. Probably it is better to propagate the false 
return code all the way to the caller.
bq. guaranteedContainerResourceReleased may fail inside but regardless of the 
outcome, we decrease numGuaranteedContainers.
These two are the current behavior without the patch. The resource release can 
fail only if resource is null, in which case is equivalent to releasing a 
zero-sized container, but it won't cause any inconsistency. 

bq.  I think that it would be nicer to lock around these two calls to become 
atomic.
That's a valid concern. container update and thus swap is atomic according to 
[~asuresh]. But that is indeed not very clear. Let's discuss this in another 
jira to see if we can improve it.
 
Will address the rest of your comments in the next patch plus unit tests.


> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls

2017-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213011#comment-16213011
 ] 

Jason Lowe commented on YARN-4163:
--

Thanks for updating the patch!

+1 lgtm.


> Audit getQueueInfo and getApplications calls
> 
>
> Key: YARN-4163
> URL: https://issues.apache.org/jira/browse/YARN-4163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4163.004.patch, YARN-4163.005.patch, 
> YARN-4163.006.branch-2.8.patch, YARN-4163.006.patch, YARN-4163.007.patch, 
> YARN-4163.2.patch, YARN-4163.2.patch, YARN-4163.3.patch, YARN-4163.patch
>
>
> getQueueInfo and getApplications seem to sometimes cause spike of load but 
> not able to confirm due to they are not audit logged. This patch propose to 
> add them to audit log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7217) Improve API service usability for updating service spec and state

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213006#comment-16213006
 ] 

Eric Yang edited comment on YARN-7217 at 10/20/17 6:24 PM:
---

- Fixed actionBuild to deploy config to solr.
- Fixed PUT method for state for service to be in sync with code from HEAD of 
yarn-native-services.
- Fixed Solr version definition in hadoop-project/pom.xml


was (Author: eyang):
- Fixed actionBuild to deploy config to solr.
- Fixed PUT method for state for service to be in sync with code from HEAD of 
yarn-native-services.


> Improve API service usability for updating service spec and state
> -
>
> Key: YARN-7217
> URL: https://issues.apache.org/jira/browse/YARN-7217
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7217.yarn-native-services.001.patch, 
> YARN-7217.yarn-native-services.002.patch, 
> YARN-7217.yarn-native-services.003.patch, 
> YARN-7217.yarn-native-services.004.patch, 
> YARN-7217.yarn-native-services.005.patch
>
>
> API service for deploy, and manage YARN services have several limitations.
> {{updateService}} API provides multiple functions:
> # Stopping a service.
> # Start a service.
> # Increase or decrease number of containers.  (This was removed in YARN-7323).
> The overloading is buggy depending on how the configuration should be applied.
> h4. Scenario 1
> A user retrieves Service object from getService call, and the Service object 
> contains state: STARTED.  The user would like to increase number of 
> containers for the deployed service.  The JSON has been updated to increase 
> container count.  The PUT method does not actually increase container count.
> h4. Scenario 2
> A user retrieves Service object from getService call, and the Service object 
> contains state: STOPPED.  The user would like to make a environment 
> configuration change.  The configuration does not get updated after PUT 
> method.
> This is possible to address by rearranging the logic of START/STOP after 
> configuration update.  However, there are other potential combinations that 
> can break PUT method.  For example, user like to make configuration changes, 
> but not yet restart the service until a later time.
> h4. Scenario 3
> There is no API to list all deployed applications by the same user.
> h4. Scenario 4
> Desired state (spec) and current state are represented by the same Service 
> object.  There is no easy way to identify "state" is desired state to reach 
> or, the current state of the service.  It would be nice to have ability to 
> retrieve both desired state, and current state with separated entry points.  
> By implementing /spec and /state, it can resolve this problem.
> h4. Scenario 5
> List all services deploy by the same user can trigger a directory listing 
> operation on namenode if hdfs is used as storage for metadata.  When hundred 
> of users use Service UI to view or deploy applications, this will trigger 
> denial of services attack on namenode.  The sparse small metadata files also 
> reduce efficiency of Namenode memory usage.  Hence, a cache layer for storing 
> service metadata can reduce namenode stress.
> h3. Proposed change
> ApiService can separate the PUT method into two PUT methods for configuration 
> changes vs operation changes.  New API could look like:
> {code}
> @PUT
> /ws/v1/services/[service_name]/spec
> Request Data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "number_of_containers": 2,
>   "artifact": {
> "id": "centos/mysql-57-centos7:latest",
> "type": "DOCKER"
>   },
>   "run_privileged_container": false,
>   "launch_command": "",
>   "resource": {
> "cpus": 1,
> "memory": "2048"
>   },
>   "configuration": {
> "env": {
>   "MYSQL_USER":"${USER}",
>   "MYSQL_PASSWORD":"password"
> }
>   }
>  }
>   ],
>   "quicklinks": {
> "Apache Document Root": 
> "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;,
> "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;
>   }
> }
> {code}
> {code}
> @PUT
> /ws/v1/services/[service_name]/state
> Request data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "state": "STOPPED"
>  }
>   ]
> }
> {code}
> SOLR can be used to cache Yarnfile to improve lookup performance and reduce 
> stress of namenode small file problems and high frequency lookup.  SOLR is 
> chosen for caching metadata because its indexing feature can be used to build 
> full text search for application catalog as well.
> For service that requires configuration changes to increase or 

[jira] [Updated] (YARN-7217) Improve API service usability for updating service spec and state

2017-10-20 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7217:

Attachment: YARN-7217.yarn-native-services.005.patch

- Fixed actionBuild to deploy config to solr.
- Fixed PUT method for state for service to be in sync with code from HEAD of 
yarn-native-services.


> Improve API service usability for updating service spec and state
> -
>
> Key: YARN-7217
> URL: https://issues.apache.org/jira/browse/YARN-7217
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, applications
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7217.yarn-native-services.001.patch, 
> YARN-7217.yarn-native-services.002.patch, 
> YARN-7217.yarn-native-services.003.patch, 
> YARN-7217.yarn-native-services.004.patch, 
> YARN-7217.yarn-native-services.005.patch
>
>
> API service for deploy, and manage YARN services have several limitations.
> {{updateService}} API provides multiple functions:
> # Stopping a service.
> # Start a service.
> # Increase or decrease number of containers.  (This was removed in YARN-7323).
> The overloading is buggy depending on how the configuration should be applied.
> h4. Scenario 1
> A user retrieves Service object from getService call, and the Service object 
> contains state: STARTED.  The user would like to increase number of 
> containers for the deployed service.  The JSON has been updated to increase 
> container count.  The PUT method does not actually increase container count.
> h4. Scenario 2
> A user retrieves Service object from getService call, and the Service object 
> contains state: STOPPED.  The user would like to make a environment 
> configuration change.  The configuration does not get updated after PUT 
> method.
> This is possible to address by rearranging the logic of START/STOP after 
> configuration update.  However, there are other potential combinations that 
> can break PUT method.  For example, user like to make configuration changes, 
> but not yet restart the service until a later time.
> h4. Scenario 3
> There is no API to list all deployed applications by the same user.
> h4. Scenario 4
> Desired state (spec) and current state are represented by the same Service 
> object.  There is no easy way to identify "state" is desired state to reach 
> or, the current state of the service.  It would be nice to have ability to 
> retrieve both desired state, and current state with separated entry points.  
> By implementing /spec and /state, it can resolve this problem.
> h4. Scenario 5
> List all services deploy by the same user can trigger a directory listing 
> operation on namenode if hdfs is used as storage for metadata.  When hundred 
> of users use Service UI to view or deploy applications, this will trigger 
> denial of services attack on namenode.  The sparse small metadata files also 
> reduce efficiency of Namenode memory usage.  Hence, a cache layer for storing 
> service metadata can reduce namenode stress.
> h3. Proposed change
> ApiService can separate the PUT method into two PUT methods for configuration 
> changes vs operation changes.  New API could look like:
> {code}
> @PUT
> /ws/v1/services/[service_name]/spec
> Request Data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "number_of_containers": 2,
>   "artifact": {
> "id": "centos/mysql-57-centos7:latest",
> "type": "DOCKER"
>   },
>   "run_privileged_container": false,
>   "launch_command": "",
>   "resource": {
> "cpus": 1,
> "memory": "2048"
>   },
>   "configuration": {
> "env": {
>   "MYSQL_USER":"${USER}",
>   "MYSQL_PASSWORD":"password"
> }
>   }
>  }
>   ],
>   "quicklinks": {
> "Apache Document Root": 
> "http://httpd.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;,
> "PHP MyAdmin": "http://phpmyadmin.${SERVICE_NAME}.${USER}.${DOMAIN}:8080/;
>   }
> }
> {code}
> {code}
> @PUT
> /ws/v1/services/[service_name]/state
> Request data:
> {
>   "name": "amp",
>   "components": [
> {
>   "name": "mysql",
>   "state": "STOPPED"
>  }
>   ]
> }
> {code}
> SOLR can be used to cache Yarnfile to improve lookup performance and reduce 
> stress of namenode small file problems and high frequency lookup.  SOLR is 
> chosen for caching metadata because its indexing feature can be used to build 
> full text search for application catalog as well.
> For service that requires configuration changes to increase or decrease node 
> count.  The calling sequence is:
> {code}
> # GET /ws/v1/services/{service_name}/spec
> # Change number_of_containers to desired number.
> # PUT /ws/v1/services/{service_name}/spec to update the spec.
> # PUT /ws/v1/services/{service_name}/state to 

[jira] [Commented] (YARN-7355) TestDistributedShell should be scheduler agnostic

2017-10-20 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212989#comment-16212989
 ] 

Yufei Gu commented on YARN-7355:


+1.

> TestDistributedShell should be scheduler agnostic 
> --
>
> Key: YARN-7355
> URL: https://issues.apache.org/jira/browse/YARN-7355
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7355.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)

2017-10-20 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212988#comment-16212988
 ] 

Vrushali C commented on YARN-7169:
--

Looking at the last two builds, I think things are looking good for the patch. 
The HDFS test timeouts are unrelated.

I will proceed with the merge to branch-2 

> Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)
> --
>
> Key: YARN-7169
> URL: https://issues.apache.org/jira/browse/YARN-7169
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Critical
> Attachments: FlowRunDetails_Sleepjob.png, Metrics_Yarn_UI.png, 
> YARN-7169-YARN-3368_branch2.0001.patch, 
> YARN-7169-YARN-5355_branch2.0001.patch, 
> YARN-7169-YARN-5355_branch2.0002.patch, 
> YARN-7169-YARN-5355_branch2.0003.patch, 
> YARN-7169-YARN-5355_branch2.0004.patch, 
> YARN-7169-YARN-5355_branch2.0004.patch, YARN-7169-branch-2.0001.patch, 
> YARN-7169-branch-2.0002.patch, ui_commits(1)
>
>
> Jira to track the backport of the new yarn-ui onto branch2. Right now adding 
> into Timeline Service v2's branch2 which is YARN-5355_branch2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories

2017-10-20 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212977#comment-16212977
 ] 

Eric Badger commented on YARN-7353:
---

Thanks, [~eyang]!

> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
> if (permitted_mount_len > 0
> && permitted_mounts[i][permitted_mount_len - 1] == '/') {
>   if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) 
> == 0) {
> ret = 1;
> break;
>   }
> }
> {noformat}
> This code will treat "/home/" as a directory, but not "/home"
> {noformat}
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] TestDockerUtil.test_check_mount_permitted
> [  FAILED  ] TestDockerUtil.test_normalize_mounts
> [  FAILED  ] TestDockerUtil.test_add_rw_mounts
> {noformat}
> Additionally, YARN-6623 introduced new test failures in the C++ 
> container-executor test "cetest"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212971#comment-16212971
 ] 

Haibo Chen commented on YARN-4511:
--

bq. however we need to make sure it reflects the state of the object, so for 
example allocateContainer() should set this value as the last step after the 
allocatedContainers.put() call. 
bq. containerResourceReleased should decrease resourceAllocatedPendingLaunch, 
if the container has not been started, yet.
Good points, will address in the next patch.
bq. I think that it would be nicer to lock around these two calls to become 
atomic.
swapContainer() is already protected in a writeLock, so it is already atomic, 
no?

bq. isValidGuaranteedContainer and isValidOpportunisticContainer contain the 
same code. Should they be different? 
I'm inclined to keep both of them. The caller may want to check whether it is a 
guaranteed or opportunistic, not just whether it has been allocated on the node
It just so happens that we are sharing the same map for both OPPORTUNISTIC and 
GUARANTEED containers, hence the code is identical.
I'll add Execution Type check to be more rigorous.

bq. allocatedContainers.remove(containerId); can be placed outside the if.
{code:java}
if (container.getExecutionType() == ExecutionType.GUARANTEED) {
  guaranteedContainerResourceReleased(container);
  numGuaranteedContainers--;
} else {
  opportunisticContainerResourceReleased(container);
  numOpportunisticContainers--;
}
allocatedContainers.remove(containerId);
{code}
The above code will update the num*Containers counter before 
allocatedContainers is updated, so I think we should keep it as it.

> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212967#comment-16212967
 ] 

Eric Yang commented on YARN-7326:
-

[~jianhe] I will add comments for updateDNSServer method to describe what it 
does.  For testing, try:

{code}
dig @localhost -p 5353 .
dig @localhost -p 5353 google.com.
{code}

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7326) Some issues in RegistryDNS

2017-10-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212950#comment-16212950
 ] 

Jian He commented on YARN-7326:
---

[~eyang],
I'm not familiar with the JAVA DNS libs,  could you add some comments in the 
code to explain what the new method is doing ?, like the updateDNSServer 
method. It'll be useful for people who aren't familiar with these libs to 
understand.

And how can I test this this change  ?

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212943#comment-16212943
 ] 

Hadoop QA commented on YARN-7169:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
22s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
12s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 
17s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn . 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
45s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  9m 
22s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  9m 22s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
10s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
4s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 43s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ha.TestZKFailoverController |
| Timed out junit tests | org.apache.hadoop.http.TestHttpServer |
|   | org.apache.hadoop.log.TestLogLevel |
\\
\\
|| Subsystem || Report/Notes ||
| Docker 

[jira] [Commented] (YARN-7351) High CPU usage issue in RegistryDNS

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212937#comment-16212937
 ] 

Eric Yang commented on YARN-7351:
-

+1 for disabling TCP channel for now.

> High CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7243) Moving logging APIs over to slf4j in hadoop-yarn-server-resourcemanager

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212922#comment-16212922
 ] 

Hadoop QA commented on YARN-7243:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 65 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
31s{color} | {color:green} root generated 0 new + 1251 unchanged - 5 fixed = 
1251 total (was 1256) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 57s{color} | {color:orange} root: The patch generated 12 new + 3750 
unchanged - 29 fixed = 3762 total (was 3779) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
47s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7243 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893180/YARN-7243.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c2649948c163 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git 

[jira] [Commented] (YARN-7261) Add debug message for better download latency monitoring

2017-10-20 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212892#comment-16212892
 ] 

Yufei Gu commented on YARN-7261:


Thanks for the review, [~xiaochen]. Committed to trunk, branch-3.0 and branch-2.

> Add debug message for better download latency monitoring
> 
>
> Key: YARN-7261
> URL: https://issues.apache.org/jira/browse/YARN-7261
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7261.001.patch, YARN-7261.002.patch, 
> YARN-7261.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.002.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-7276.000.patch, YARN-7276.001.patch, 
> YARN-7276.002.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7261) Add debug message for better download latency monitoring

2017-10-20 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-7261:
---
Summary: Add debug message for better download latency monitoring  (was: 
Add debug message in class FSDownload for better download latency monitoring)

> Add debug message for better download latency monitoring
> 
>
> Key: YARN-7261
> URL: https://issues.apache.org/jira/browse/YARN-7261
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7261.001.patch, YARN-7261.002.patch, 
> YARN-7261.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7261) Add debug message in class FSDownload for better download latency monitoring

2017-10-20 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212874#comment-16212874
 ] 

Xiao Chen commented on YARN-7261:
-

+1 on patch 3, thanks Yufei!

> Add debug message in class FSDownload for better download latency monitoring
> 
>
> Key: YARN-7261
> URL: https://issues.apache.org/jira/browse/YARN-7261
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7261.001.patch, YARN-7261.002.patch, 
> YARN-7261.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7357) Several methods in TestZKRMStateStore.TestZKRMStateStoreTester.TestZKRMStateStoreInternal should have @Override annotations

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212856#comment-16212856
 ] 

Hadoop QA commented on YARN-7357:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 49s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}117m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService 
|
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7357 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893169/YARN-7357.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a6ecef593760 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1f4cdf1 |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18057/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18057/testReport/ |
| modules | C: 

[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-20 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212848#comment-16212848
 ] 

Miklos Szegedi commented on YARN-4511:
--

Thank you, [~haibochen] for the patch.
{code}
342 // notify schedulerNode of the update to correct resource accounting
343 node.containerUpdated(existingRMContainer, existingContainer);
344 
345 
((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
346 // notify SchedulerNode of the update to correct resource accounting
347 node.containerUpdated(tempRMContainer, tempContainer);
348 
{code}
I think that it would be nicer to lock around these two calls to become atomic.
{code}
431   public int getNumOpportunisticContainers() {
432 return numOpportunisticContainers;
321   }
{code}
This function takes a sample but does not lock. This is fine, however we need 
to make sure it reflects the state of the object, so for example 
allocateContainer() should set this value as the last step after the 
allocatedContainers.put() call.
If containerResourceAllocated fails in guaranteedContainerResourceAllocated we 
will still call allocatedContainers.put(). I think this may cause some 
inconsistencies in the future. Probably it is better to propagate the false 
return code all the way to the caller.
isValidGuaranteedContainer and isValidOpportunisticContainer contain the same 
code. Should they be different? Would an isValidContainer function be 
sufficient?
{code}
294 Container container = rmContainer.getContainer();
295 if (container.getExecutionType() == ExecutionType.GUARANTEED) {
296   guaranteedContainerResourceReleased(container);
297   allocatedContainers.remove(containerId);
298   numGuaranteedContainers--;
299 } else {
300   opportunisticContainerResourceReleased(container);
301   numOpportunisticContainers--;
302   allocatedContainers.remove(containerId);
303 }
{code}
allocatedContainers.remove(containerId); can be placed outside the if.

containerResourceReleased should decrease resourceAllocatedPendingLaunch, if 
the container has not been started, yet.

guaranteedContainerResourceReleased may fail inside but regardless of the 
outcome, we decrease numGuaranteedContainers.
{{ + ", which has " + getNumGuaranteedContainers() + " containers, "}} should 
be {{ + ", which has " + getNumGuaranteedContainers() + " guaranteed 
containers, "}}
I do not see unit tests added for getNumOpportunisticContainers() and 
opportunistic container code paths added in general.


> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories

2017-10-20 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212845#comment-16212845
 ] 

Eric Yang commented on YARN-7353:
-

Thank you [~ebadger].  The test passes on CentOS 7.

+1

I just committed this.

> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
> if (permitted_mount_len > 0
> && permitted_mounts[i][permitted_mount_len - 1] == '/') {
>   if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) 
> == 0) {
> ret = 1;
> break;
>   }
> }
> {noformat}
> This code will treat "/home/" as a directory, but not "/home"
> {noformat}
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] TestDockerUtil.test_check_mount_permitted
> [  FAILED  ] TestDockerUtil.test_normalize_mounts
> [  FAILED  ] TestDockerUtil.test_add_rw_mounts
> {noformat}
> Additionally, YARN-6623 introduced new test failures in the C++ 
> container-executor test "cetest"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7360) TestRM.testNMTokenSentForNormalContainer() should be scheduler agnostic

2017-10-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7360:
-
Summary: TestRM.testNMTokenSentForNormalContainer() should be scheduler 
agnostic  (was: TestRM.testNMTokenSentForNormalContainer() fails with Fair 
Scheduler)

> TestRM.testNMTokenSentForNormalContainer() should be scheduler agnostic
> ---
>
> Key: YARN-7360
> URL: https://issues.apache.org/jira/browse/YARN-7360
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7360.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7372) TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic is flaky

2017-10-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7372:
-
Attachment: YARN-7372.01.patch

Attach a new patch to address the check style indention issue. The 
TestDistributedScheduler failure is YARN-7299

> TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic
>  is flaky 
> 
>
> Key: YARN-7372
> URL: https://issues.apache.org/jira/browse/YARN-7372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>  Labels: unit-test
> Attachments: YARN-7372.00.patch, YARN-7372.01.patch
>
>
> testContainerUpdateExecTypeGuaranteedToOpportunistic waits for the container 
> to be running before it sends container update request.
> The container update is handled asynchronously in node manager, and it does 
> not trigger visible state transition. If the node manager event
> dispatch thread is slow, the unit test can fail at the the assertion 
> {code} Assert.assertEquals(ExecutionType.OPPORTUNISTIC, 
> status.getExecutionType());{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation

2017-10-20 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212819#comment-16212819
 ] 

Eric Badger commented on YARN-7361:
---

+1 (non-binding) looks good to me

> Improve the docker container runtime documentation
> --
>
> Key: YARN-7361
> URL: https://issues.apache.org/jira/browse/YARN-7361
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-7361.001.patch
>
>
> During review of YARN-7230, it was found that 
> yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker 
> containers documentation in most of the active branches. We can also improve 
> the warning that was introduced in YARN-6622.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7261) Add debug message in class FSDownload for better download latency monitoring

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212810#comment-16212810
 ] 

Hadoop QA commented on YARN-7261:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m  5s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 12s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7261 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893189/YARN-7261.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 75414fcbb83a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1f4cdf1 |
| Default Java | 1.8.0_131 |
| unit | 

[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories

2017-10-20 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212799#comment-16212799
 ] 

Eric Badger commented on YARN-7353:
---

Test failure is unrelated. [~eyang], [~vvasudev] could you review?

> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
> if (permitted_mount_len > 0
> && permitted_mounts[i][permitted_mount_len - 1] == '/') {
>   if (strncmp(normalized_path, permitted_mounts[i], permitted_mount_len) 
> == 0) {
> ret = 1;
> break;
>   }
> }
> {noformat}
> This code will treat "/home/" as a directory, but not "/home"
> {noformat}
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] TestDockerUtil.test_check_mount_permitted
> [  FAILED  ] TestDockerUtil.test_normalize_mounts
> [  FAILED  ] TestDockerUtil.test_add_rw_mounts
> {noformat}
> Additionally, YARN-6623 introduced new test failures in the C++ 
> container-executor test "cetest"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation

2017-10-20 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212794#comment-16212794
 ] 

Shane Kumpf commented on YARN-7361:
---

The patch brings over the warning and missing property from YARN-7230. I 
believe we need this in trunk, branch-2, and branch-3.0.

> Improve the docker container runtime documentation
> --
>
> Key: YARN-7361
> URL: https://issues.apache.org/jira/browse/YARN-7361
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-7361.001.patch
>
>
> During review of YARN-7230, it was found that 
> yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker 
> containers documentation in most of the active branches. We can also improve 
> the warning that was introduced in YARN-6622.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7353) Docker permitted volumes don't properly check for directories

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212793#comment-16212793
 ] 

Hadoop QA commented on YARN-7353:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 46s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7353 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893133/YARN-7353.003.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 9f1427ed9ff7 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1f4cdf1 |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18052/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18052/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18052/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Docker permitted volumes don't properly check for directories
> -
>
> Key: YARN-7353
> URL: https://issues.apache.org/jira/browse/YARN-7353
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-7353.001.patch, YARN-7353.002.patch, 
> YARN-7353.003.patch
>
>
> {noformat:title=docker-util.c:check_mount_permitted()}
> // directory check
> permitted_mount_len = strlen(permitted_mounts[i]);
>  

[jira] [Updated] (YARN-7102) NM heartbeat stuck when responseId overflows MAX_INT

2017-10-20 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-7102:
-
Attachment: YARN-7102-branch-2.v9.patch

Thanks for porting the patches!  I'm uploading the branch-2 patch again since 
Jenkins never commented on it.  When both patches were attached at the same 
time it only commented on the 2.8 patch.

Speaking of the 2.8 patch, it deleted the Overrides annotation on 
NodeInfo#pullNewlyIncreasedContainers which I assume was unintentional.  
Otherwise it looks good.  I agree the test failures are unrelated.  
TestClientRMTokens and TestAMAuthorization are failing due to unknown host 
exceptions triggered by the docker environment, and the capacity scheduler 
preemption test is passing locally.


> NM heartbeat stuck when responseId overflows MAX_INT
> 
>
> Key: YARN-7102
> URL: https://issues.apache.org/jira/browse/YARN-7102
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Critical
> Attachments: YARN-7102-branch-2.8.v10.patch, 
> YARN-7102-branch-2.8.v9.patch, YARN-7102-branch-2.v9.patch, 
> YARN-7102-branch-2.v9.patch, YARN-7102.v1.patch, YARN-7102.v2.patch, 
> YARN-7102.v3.patch, YARN-7102.v4.patch, YARN-7102.v5.patch, 
> YARN-7102.v6.patch, YARN-7102.v7.patch, YARN-7102.v8.patch, YARN-7102.v9.patch
>
>
> ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM 
> heartbeat in YARN-6640, please refer to YARN-6640 for details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7361) Improve the docker container runtime documentation

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212788#comment-16212788
 ] 

Hadoop QA commented on YARN-7361:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
29m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-7361 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893223/YARN-7361.001.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 452de7033eaf 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1f4cdf1 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18055/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve the docker container runtime documentation
> --
>
> Key: YARN-7361
> URL: https://issues.apache.org/jira/browse/YARN-7361
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-7361.001.patch
>
>
> During review of YARN-7230, it was found that 
> yarn.nodemanager.runtime.linux.docker.capabilities is missing from the docker 
> containers documentation in most of the active branches. We can also improve 
> the warning that was introduced in YARN-6622.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping

2017-10-20 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7117:
---
Attachment: YARN-7117.poc.1.patch

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> --
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, 
> YARN-7117.poc.1.patch, YARN-7117.poc.patch
>
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7117) Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue Mapping

2017-10-20 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7117:
---
Attachment: (was: YARN-7117.poc.1.patch)

> Capacity Scheduler: Support Auto Creation of Leaf Queues While Doing Queue 
> Mapping
> --
>
> Key: YARN-7117
> URL: https://issues.apache.org/jira/browse/YARN-7117
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: 
> YARN-7117.Capacity.Scheduler.Support.Auto.Creation.Of.Leaf.Queue.pdf, 
> YARN-7117.poc.patch
>
>
> Currently Capacity Scheduler doesn't support auto creation of queues when 
> doing queue mapping. We saw more and more use cases which has complex queue 
> mapping policies configured to handle application to queues mapping. 
> The most common use case of CapacityScheduler queue mapping is to create one 
> queue for each user/group. However update {{capacity-scheduler.xml}} and 
> {{RMAdmin:refreshQueues}} needs to be done when new user/group onboard. One 
> of the option to solve the problem is automatically create queues when new 
> user/group arrives.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7169) Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)

2017-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212730#comment-16212730
 ] 

Hadoop QA commented on YARN-7169:
-

(!) A patch to the testing environment has been detected. 
Re-executing against the patched versions to perform further tests. 
The console is at 
https://builds.apache.org/job/PreCommit-YARN-Build/18056/console in case of 
problems.


> Backport new yarn-ui to branch2 code (starting with YARN-5355_branch2)
> --
>
> Key: YARN-7169
> URL: https://issues.apache.org/jira/browse/YARN-7169
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>Priority: Critical
> Attachments: FlowRunDetails_Sleepjob.png, Metrics_Yarn_UI.png, 
> YARN-7169-YARN-3368_branch2.0001.patch, 
> YARN-7169-YARN-5355_branch2.0001.patch, 
> YARN-7169-YARN-5355_branch2.0002.patch, 
> YARN-7169-YARN-5355_branch2.0003.patch, 
> YARN-7169-YARN-5355_branch2.0004.patch, 
> YARN-7169-YARN-5355_branch2.0004.patch, YARN-7169-branch-2.0001.patch, 
> YARN-7169-branch-2.0002.patch, ui_commits(1)
>
>
> Jira to track the backport of the new yarn-ui onto branch2. Right now adding 
> into Timeline Service v2's branch2 which is YARN-5355_branch2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >