> On June 21, 2016, 3:25 p.m., Andrew Onischuk wrote: > > We have some logs which are triggered with every heartbeat. This would > > flood logs badly if done every second. Could we fix this. (possibly in a > > separate jira)
I think handling these on agent would be enough: INFO 2016-06-21 15:24:24,407 Controller.py:271 - Heartbeat response received (id = 67) INFO 2016-06-21 15:24:34,308 Heartbeat.py:81 - Building Heartbeat: {responseId = 67, timestamp = 1466522674308, commandsInProgress = True, componentsMapped = True,recoveryTimestamp = 1466522118933} INFO 2016-06-21 15:24:34,353 Controller.py:271 - Heartbeat response received (id = 68) INFO 2016-06-21 15:24:44,254 Heartbeat.py:81 - Building Heartbeat: {responseId = 68, timestamp = 1466522684253, commandsInProgress = True, componentsMapped = True,recoveryTimestamp = 1466522118933} INFO 2016-06-21 15:24:44,299 Controller.py:271 - Heartbeat response received (id = 69) INFO 2016-06-21 15:24:54,200 Heartbeat.py:81 - Building Heartbeat: {responseId = 69, timestamp = 1466522694200, commandsInProgress = True, componentsMapped = True,recoveryTimestamp = 1466522118933} INFO 2016-06-21 15:24:54,256 Controller.py:271 - Heartbeat response received (id = 70 Also we need to check if there are on heartbeat logs in server. - Andrew ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48722/#review138851 ----------------------------------------------------------- On June 21, 2016, 3:19 p.m., Sebastian Toader wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48722/ > ----------------------------------------------------------- > > (Updated June 21, 2016, 3:19 p.m.) > > > Review request for Ambari, Andrew Onischuk, Laszlo Puskas, Robert Levas, > Sandor Magyari, and Sumit Mohanty. > > > Bugs: AMBARI-17248 > https://issues.apache.org/jira/browse/AMBARI-17248 > > > Repository: ambari > > > Description > ------- > > Commands to be executed by ambari-agents are being sent down by the server in > the response message to agent heartbeat messages. > The server processes the received heartbeat, it checks if there are next > commands scheduled to be executed by ambari-agent and adds those to the > heartbeat response for the ambari-agent. > The server organises the commands that can be executed in parallel into > stages. Ambari server ensures that only the commands of a single stage is > scheduled to be executed by the agent and starts scheduling the commands of > the next stage only after all commands of current stage has finished > successfully. > The processing of command status received with the heartbeat message happens > asynchronously to heartbeat response in HeartBeatProcessor and > ActionScheduler creation thus when the heartbeat response is created the > commands for the next stage are not scheduled yet. This means that the next > commands will be sent to agent only with the next heartbeat. > Agents currently sends a heartbeat to the server on command a completion or > at a timeout = self.netutil.HEARTBEAT_IDDLE_INTERVAL_SEC – > self.netutil.MINIMUM_INTERVAL_BETWEEN_HEARTBEATS interval which is ~10 > seconds if there are no commands to be executed. > This means that when the server receives a heartbeat triggered by the > completion of the last command from the current stage the server will send > the commands for the next stage only 10 seconds later when the next heartbeat > is received. This leads to agents spending considerable amount of time idle > when there are multiple stages to be executed. > Agents should heartbeat at a higher rate while there are still pending stages > to be executed. > > > Diffs > ----- > > ambari-agent/conf/unix/ambari-agent.ini 8f2ab1b > ambari-agent/conf/unix/upgrade_agent_configs.py 583b5aa > ambari-agent/conf/windows/ambari-agent.ini df88be6 > ambari-agent/src/main/python/ambari_agent/AmbariConfig.py 89a881a > ambari-agent/src/main/python/ambari_agent/Controller.py e981a76 > ambari-agent/src/main/python/ambari_agent/Heartbeat.py 91098e0 > ambari-agent/src/main/python/ambari_agent/NetUtil.py 80bf3ae > ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py f113083 > ambari-agent/src/test/python/ambari_agent/TestNetUtil.py d72e319 > ambari-agent/src/test/python/ambari_agent/examples/ControllerTester.py > 8103872 > > ambari-server/src/main/java/org/apache/ambari/server/agent/HeartBeatHandler.java > 35a37e3 > > ambari-server/src/main/java/org/apache/ambari/server/agent/HeartBeatResponse.java > 1ab7ae9 > ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java > ac0ddd2 > ambari-server/src/main/java/org/apache/ambari/server/state/Clusters.java > bd9de13 > > ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java > 3d2388e > > ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClustersImpl.java > c26e1e9 > > ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterImplTest.java > 627ade9 > > Diff: https://reviews.apache.org/r/48722/diff/ > > > Testing > ------- > > Manual testing. > > Unit tests in succeeded. > > > Thanks, > > Sebastian Toader > >