Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

2017-01-16 Thread Andrew Onischuk

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/#review161724
---


Ship it!




Ship It!

- Andrew Onischuk


On Jan. 13, 2017, 1:59 p.m., Sebastian Toader wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55494/
> ---
> 
> (Updated Jan. 13, 2017, 1:59 p.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Sandor Magyari, 
> and Sid Wagle.
> 
> 
> Bugs: AMBARI-19520
> https://issues.apache.org/jira/browse/AMBARI-19520
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Problem:
> 
> In case ambari server is restarted after restart will ask agents to 
> re-register with the server.
> Once the agent successfully re-registered with the server it should be 
> transition out from heartbeat lost state. However in some cases it takes a 
> while until agents transition out from heartbeat lost state thus the server 
> may request the agent to re-register again.
> 
> Solution:
> Ensure upon agent re-regsitration that StatusCommandExecutor child process is 
> spawned before status commands received from the server (in the response to 
> the registration) are added to the status command queue.
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
>   ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 
> 
> Diff: https://reviews.apache.org/r/55494/diff/
> 
> 
> Testing
> ---
> 
> Manually tested covering:
> 1. Restart agent
> 2. Restart amabari-server with agents being up and running
> 3. Kill StatusCommandExecutor child process
> 
> Unit test results:
> 
> Total run:1158
> Total errors:0
> Total failures:0
> OK
> 
> Ran 452 tests in 20.976s
> 
> OK
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Main ... SUCCESS [10.142s]
> [INFO] Apache Ambari Project POM . SUCCESS [0.029s]
> [INFO] Ambari Views .. SUCCESS [1.707s]
> [INFO] utility ... SUCCESS [1.189s]
> [INFO] ambari-metrics  SUCCESS [0.473s]
> [INFO] Ambari Metrics Common . SUCCESS [1.012s]
> [INFO] Ambari Server . SUCCESS [1:45.492s]
> [INFO] Ambari Agent .. SUCCESS [25.860s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>



Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

2017-01-16 Thread Sandor Magyari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/#review161699
---


Ship it!




Ship It!

- Sandor Magyari


On Jan. 13, 2017, 1:59 p.m., Sebastian Toader wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55494/
> ---
> 
> (Updated Jan. 13, 2017, 1:59 p.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Sandor Magyari, 
> and Sid Wagle.
> 
> 
> Bugs: AMBARI-19520
> https://issues.apache.org/jira/browse/AMBARI-19520
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Problem:
> 
> In case ambari server is restarted after restart will ask agents to 
> re-register with the server.
> Once the agent successfully re-registered with the server it should be 
> transition out from heartbeat lost state. However in some cases it takes a 
> while until agents transition out from heartbeat lost state thus the server 
> may request the agent to re-register again.
> 
> Solution:
> Ensure upon agent re-regsitration that StatusCommandExecutor child process is 
> spawned before status commands received from the server (in the response to 
> the registration) are added to the status command queue.
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
>   ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 
> 
> Diff: https://reviews.apache.org/r/55494/diff/
> 
> 
> Testing
> ---
> 
> Manually tested covering:
> 1. Restart agent
> 2. Restart amabari-server with agents being up and running
> 3. Kill StatusCommandExecutor child process
> 
> Unit test results:
> 
> Total run:1158
> Total errors:0
> Total failures:0
> OK
> 
> Ran 452 tests in 20.976s
> 
> OK
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Main ... SUCCESS [10.142s]
> [INFO] Apache Ambari Project POM . SUCCESS [0.029s]
> [INFO] Ambari Views .. SUCCESS [1.707s]
> [INFO] utility ... SUCCESS [1.189s]
> [INFO] ambari-metrics  SUCCESS [0.473s]
> [INFO] Ambari Metrics Common . SUCCESS [1.012s]
> [INFO] Ambari Server . SUCCESS [1:45.492s]
> [INFO] Ambari Agent .. SUCCESS [25.860s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>



Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

2017-01-13 Thread Attila Doroszlai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/#review161502
---


Ship it!




Ship It!

- Attila Doroszlai


On Jan. 13, 2017, 11:50 a.m., Sebastian Toader wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55494/
> ---
> 
> (Updated Jan. 13, 2017, 11:50 a.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor 
> Magyari.
> 
> 
> Bugs: AMBARI-19520
> https://issues.apache.org/jira/browse/AMBARI-19520
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Problem:
> 
> In case ambari server is restarted after restart will ask agents to 
> re-register with the server.
> Once the agent successfully re-registered with the server it should be 
> transition out from heartbeat lost state. However in some cases it takes a 
> while until agents transition out from heartbeat lost state thus the server 
> may request the agent to re-register again.
> 
> Solution:
> Ensure upon agent re-regsitration that StatusCommandExecutor child process is 
> spawned before status commands received from the server (in the response to 
> the registration) are added to the status command queue.
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
>   ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 
> 
> Diff: https://reviews.apache.org/r/55494/diff/
> 
> 
> Testing
> ---
> 
> Manually tested covering:
> 1. Restart agent
> 2. Restart amabari-server with agents being up and running
> 3. Kill StatusCommandExecutor child process
> 
> Unit test results:
> 
> Total run:1158
> Total errors:0
> Total failures:0
> OK
> 
> Ran 452 tests in 20.976s
> 
> OK
> 
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Main ... SUCCESS [10.142s]
> [INFO] Apache Ambari Project POM . SUCCESS [0.029s]
> [INFO] Ambari Views .. SUCCESS [1.707s]
> [INFO] utility ... SUCCESS [1.189s]
> [INFO] ambari-metrics  SUCCESS [0.473s]
> [INFO] Ambari Metrics Common . SUCCESS [1.012s]
> [INFO] Ambari Server . SUCCESS [1:45.492s]
> [INFO] Ambari Agent .. SUCCESS [25.860s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>



Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

2017-01-13 Thread Sebastian Toader

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/
---

Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor 
Magyari.


Bugs: AMBARI-19520
https://issues.apache.org/jira/browse/AMBARI-19520


Repository: ambari


Description
---

Problem:

In case ambari server is restarted after restart will ask agents to re-register 
with the server.
Once the agent successfully re-registered with the server it should be 
transition out from heartbeat lost state. However in some cases it takes a 
while until agents transition out from heartbeat lost state thus the server may 
request the agent to re-register again.

Solution:
Ensure upon agent re-regsitration that StatusCommandExecutor child process is 
spawned before status commands received from the server (in the response to the 
registration) are added to the status command queue.


Diffs
-

  ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
  ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 

Diff: https://reviews.apache.org/r/55494/diff/


Testing
---

Manually tested covering:
1. Restart agent
2. Restart amabari-server with agents being up and running
3. Kill StatusCommandExecutor child process

Unit test results:

Total run:1158
Total errors:0
Total failures:0
OK

Ran 452 tests in 20.976s

OK

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Ambari Main ... SUCCESS [10.142s]
[INFO] Apache Ambari Project POM . SUCCESS [0.029s]
[INFO] Ambari Views .. SUCCESS [1.707s]
[INFO] utility ... SUCCESS [1.189s]
[INFO] ambari-metrics  SUCCESS [0.473s]
[INFO] Ambari Metrics Common . SUCCESS [1.012s]
[INFO] Ambari Server . SUCCESS [1:45.492s]
[INFO] Ambari Agent .. SUCCESS [25.860s]
[INFO] 
[INFO] BUILD SUCCESS


Thanks,

Sebastian Toader