-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59684/
-----------------------------------------------------------
Review request for Ambari, Attila Doroszlai and Myroslav Papirkovskyy.
Bugs: AMBARI-21142
https://issues.apache.org/jira/browse/AMBARI-21142
Repository: ambari
Description
-------
Currently if ambari server - agent communication gets out of sync only limited
information is logged . This makes difficult the troubleshooting of the root
cause also why the application can not recover from it.
Diffs
-----
ambari-agent/src/main/python/ambari_agent/Controller.py 83f1da8
ambari-server/src/main/java/org/apache/ambari/server/agent/HeartBeatHandler.java
6b93462
Diff: https://reviews.apache.org/r/59684/diff/1/
Testing
-------
Manual testing connection to server lost during heartbeat response processing
on agent side.:
ambari-agent log:
ERROR 2017-05-31 09:45:52,778 Controller.py:469 - Connection to
ambari-server.node.dc1.consul was lost (details=Simulating connection loss on
heartbeat response !!!)
...
INFO 2017-05-31 09:45:53,921 Controller.py:438 - Reconnected to
https://ambari-server.node.dc1.consul:8441/agent/v1/heartbeat/ambari-agent-1.node.dc1.consul
ambari-server log:
31 May 2017 09:45:53,918 WARN [qtp-ambari-agent-37] HeartBeatHandler:212 - Old
responseId=103 received form host ambari-agent-1.node.dc1.consul - response was
lost - returning cached response with responseId=104
Unit tests:
Agent:
Ran 452 tests in 30.156s
OK
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Ambari Main ........................................ SUCCESS [ 0.529 s]
[INFO] Apache Ambari Project POM .......................... SUCCESS [ 0.004 s]
[INFO] utility ............................................ SUCCESS [ 2.002 s]
[INFO] Ambari Agent ....................................... SUCCESS [ 48.909 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
Server:
Total run:1160
Total errors:0
Total failures:0
OK
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Ambari Main ........................................ SUCCESS [ 0.528 s]
[INFO] Apache Ambari Project POM .......................... SUCCESS [ 0.003 s]
[INFO] Ambari Views ....................................... SUCCESS [ 2.305 s]
[INFO] utility ............................................ SUCCESS [ 1.066 s]
[INFO] ambari-metrics ..................................... SUCCESS [ 0.298 s]
[INFO] Ambari Metrics Common .............................. SUCCESS [ 4.904 s]
[INFO] Ambari Server ...................................... SUCCESS [48:33 min]
[INFO] ------------------------------------------------------------------------
Thanks,
Sebastian Toader