Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55494/#review161724 --- Ship it! Ship It! - Andrew Onischuk On Jan. 13, 2017, 1:59 p.m., Sebastian Toader wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55494/ > --- > > (Updated Jan. 13, 2017, 1:59 p.m.) > > > Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Sandor Magyari, > and Sid Wagle. > > > Bugs: AMBARI-19520 > https://issues.apache.org/jira/browse/AMBARI-19520 > > > Repository: ambari > > > Description > --- > > Problem: > > In case ambari server is restarted after restart will ask agents to > re-register with the server. > Once the agent successfully re-registered with the server it should be > transition out from heartbeat lost state. However in some cases it takes a > while until agents transition out from heartbeat lost state thus the server > may request the agent to re-register again. > > Solution: > Ensure upon agent re-regsitration that StatusCommandExecutor child process is > spawned before status commands received from the server (in the response to > the registration) are added to the status command queue. > > > Diffs > - > > ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 > ambari-agent/src/main/python/ambari_agent/main.py 2e0517b > > Diff: https://reviews.apache.org/r/55494/diff/ > > > Testing > --- > > Manually tested covering: > 1. Restart agent > 2. Restart amabari-server with agents being up and running > 3. Kill StatusCommandExecutor child process > > Unit test results: > > Total run:1158 > Total errors:0 > Total failures:0 > OK > > Ran 452 tests in 20.976s > > OK > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Ambari Main ... SUCCESS [10.142s] > [INFO] Apache Ambari Project POM . SUCCESS [0.029s] > [INFO] Ambari Views .. SUCCESS [1.707s] > [INFO] utility ... SUCCESS [1.189s] > [INFO] ambari-metrics SUCCESS [0.473s] > [INFO] Ambari Metrics Common . SUCCESS [1.012s] > [INFO] Ambari Server . SUCCESS [1:45.492s] > [INFO] Ambari Agent .. SUCCESS [25.860s] > [INFO] > > [INFO] BUILD SUCCESS > > > Thanks, > > Sebastian Toader > >
Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55494/#review161699 --- Ship it! Ship It! - Sandor Magyari On Jan. 13, 2017, 1:59 p.m., Sebastian Toader wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55494/ > --- > > (Updated Jan. 13, 2017, 1:59 p.m.) > > > Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Sandor Magyari, > and Sid Wagle. > > > Bugs: AMBARI-19520 > https://issues.apache.org/jira/browse/AMBARI-19520 > > > Repository: ambari > > > Description > --- > > Problem: > > In case ambari server is restarted after restart will ask agents to > re-register with the server. > Once the agent successfully re-registered with the server it should be > transition out from heartbeat lost state. However in some cases it takes a > while until agents transition out from heartbeat lost state thus the server > may request the agent to re-register again. > > Solution: > Ensure upon agent re-regsitration that StatusCommandExecutor child process is > spawned before status commands received from the server (in the response to > the registration) are added to the status command queue. > > > Diffs > - > > ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 > ambari-agent/src/main/python/ambari_agent/main.py 2e0517b > > Diff: https://reviews.apache.org/r/55494/diff/ > > > Testing > --- > > Manually tested covering: > 1. Restart agent > 2. Restart amabari-server with agents being up and running > 3. Kill StatusCommandExecutor child process > > Unit test results: > > Total run:1158 > Total errors:0 > Total failures:0 > OK > > Ran 452 tests in 20.976s > > OK > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Ambari Main ... SUCCESS [10.142s] > [INFO] Apache Ambari Project POM . SUCCESS [0.029s] > [INFO] Ambari Views .. SUCCESS [1.707s] > [INFO] utility ... SUCCESS [1.189s] > [INFO] ambari-metrics SUCCESS [0.473s] > [INFO] Ambari Metrics Common . SUCCESS [1.012s] > [INFO] Ambari Server . SUCCESS [1:45.492s] > [INFO] Ambari Agent .. SUCCESS [25.860s] > [INFO] > > [INFO] BUILD SUCCESS > > > Thanks, > > Sebastian Toader > >
Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55494/#review161502 --- Ship it! Ship It! - Attila Doroszlai On Jan. 13, 2017, 11:50 a.m., Sebastian Toader wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55494/ > --- > > (Updated Jan. 13, 2017, 11:50 a.m.) > > > Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor > Magyari. > > > Bugs: AMBARI-19520 > https://issues.apache.org/jira/browse/AMBARI-19520 > > > Repository: ambari > > > Description > --- > > Problem: > > In case ambari server is restarted after restart will ask agents to > re-register with the server. > Once the agent successfully re-registered with the server it should be > transition out from heartbeat lost state. However in some cases it takes a > while until agents transition out from heartbeat lost state thus the server > may request the agent to re-register again. > > Solution: > Ensure upon agent re-regsitration that StatusCommandExecutor child process is > spawned before status commands received from the server (in the response to > the registration) are added to the status command queue. > > > Diffs > - > > ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 > ambari-agent/src/main/python/ambari_agent/main.py 2e0517b > > Diff: https://reviews.apache.org/r/55494/diff/ > > > Testing > --- > > Manually tested covering: > 1. Restart agent > 2. Restart amabari-server with agents being up and running > 3. Kill StatusCommandExecutor child process > > Unit test results: > > Total run:1158 > Total errors:0 > Total failures:0 > OK > > Ran 452 tests in 20.976s > > OK > > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Ambari Main ... SUCCESS [10.142s] > [INFO] Apache Ambari Project POM . SUCCESS [0.029s] > [INFO] Ambari Views .. SUCCESS [1.707s] > [INFO] utility ... SUCCESS [1.189s] > [INFO] ambari-metrics SUCCESS [0.473s] > [INFO] Ambari Metrics Common . SUCCESS [1.012s] > [INFO] Ambari Server . SUCCESS [1:45.492s] > [INFO] Ambari Agent .. SUCCESS [25.860s] > [INFO] > > [INFO] BUILD SUCCESS > > > Thanks, > > Sebastian Toader > >
Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55494/ --- Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor Magyari. Bugs: AMBARI-19520 https://issues.apache.org/jira/browse/AMBARI-19520 Repository: ambari Description --- Problem: In case ambari server is restarted after restart will ask agents to re-register with the server. Once the agent successfully re-registered with the server it should be transition out from heartbeat lost state. However in some cases it takes a while until agents transition out from heartbeat lost state thus the server may request the agent to re-register again. Solution: Ensure upon agent re-regsitration that StatusCommandExecutor child process is spawned before status commands received from the server (in the response to the registration) are added to the status command queue. Diffs - ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 ambari-agent/src/main/python/ambari_agent/main.py 2e0517b Diff: https://reviews.apache.org/r/55494/diff/ Testing --- Manually tested covering: 1. Restart agent 2. Restart amabari-server with agents being up and running 3. Kill StatusCommandExecutor child process Unit test results: Total run:1158 Total errors:0 Total failures:0 OK Ran 452 tests in 20.976s OK [INFO] [INFO] Reactor Summary: [INFO] [INFO] Ambari Main ... SUCCESS [10.142s] [INFO] Apache Ambari Project POM . SUCCESS [0.029s] [INFO] Ambari Views .. SUCCESS [1.707s] [INFO] utility ... SUCCESS [1.189s] [INFO] ambari-metrics SUCCESS [0.473s] [INFO] Ambari Metrics Common . SUCCESS [1.012s] [INFO] Ambari Server . SUCCESS [1:45.492s] [INFO] Ambari Agent .. SUCCESS [25.860s] [INFO] [INFO] BUILD SUCCESS Thanks, Sebastian Toader