Hi Daniel, Why does setting the agent to be one-way stop the keepalives, and why does it do so only on the Windows agents? The *nix agents are apparently sending their keepalives normally, since none of them appear disconnected, even though they are configured one-way as well, and behind the same network firewall as the Windows agents.
Also, we did some more testing in our lab overnight, and we realized that a one-way Windows agent which can still hear from the manager (i.e., it is configured to be one-way, but there isn't any network configuration preventing the manager from communicating back) remains properly connected. However, the same one-way Windows agent, when it can't hear from the manager (i.e., there's a router blocking all UDP traffic between the manager and the agent), shows as disconnected in agent_control. So it seems to be a bug specifically in the one-way Windows agents... Thanks! -Alisha On Apr 12, 12:08 pm, Daniel Cid <[email protected]> wrote: > Hi Alisha, > > Always Windows giving us problem :) > > The manager itself (remoted) doesn't keep state from any of the agents > (doesn't know if it is on or off). But whenever it receives the keep > alive > from the agent, it will update the timestamp from the agent file in the queue. > > That agent file is ready by the other tools (agent_control, monitord, > etc) to list which agents are on and off. So even if your agent is > sending > lots of events and communicating properly wit the manager, if it > doesn't send the keep alive, the manager will never update the file > and > all the tools will think it is disconnected... > > Hope I made sense. > > thanks, > > On Tue, Apr 12, 2011 at 3:03 PM, Alisha Kloc <[email protected]> wrote: > > Hi, > > > I don't think it's because of the one-way setup. We've seen the > > disconnect issue in lab/troubleshoot setups where one-way isn't > > configured; also, our *nix agents are configured to be one-way as > > well, but they always show properly as connected. It's only the > > Windows agents that do this. > > > Would there be any reason for the OSSEC manager to not attribute a > > process monitor event to a Windows agent, so it doesn't realize that > > the heartbeat sent by the process monitor is an event for purposes of > > keeping the agent marked alive? > > > Also, how exactly does OSSEC determine whether an agent is connected > > or not? We were originally told that it just marks an agent as > > disconnected if it hasn't seen an event from the agent in 15 minutes, > > but you just now said the agents send keepalives. Could you clarify? > > > Thanks! > > -Alisha > > > On Apr 12, 10:38 am, Daniel Cid <[email protected]> wrote: > >> Hey, > > >> Maybe because it is setup as one-way? The manager uses the keep alives > >> (internally sent by the agent) to keep > >> track if they are up or not. Since you have it disabled, those keep > >> alives are never sent, so they show up as > >> disabled. > > >> thanks, > > >> On Mon, Apr 11, 2011 at 4:36 PM, Alisha Kloc <[email protected]> > >> wrote: > >> > Hi list, > > >> > My team would like to report an unusual pattern we're seeing, and find > >> > out whether it's a bug and if it will be fixed in a future OSSEC > >> > version. > > >> > Our setup is an OSSEC v2.4.1 manager monitoring around a dozen servers > >> > and workstations (using agent v2.4.1), mixed Windows 2003 and *nix. > >> > Our systems are strict one-way; that is, the agents can talk to the > >> > manager, but the manager cannot talk to the agents. Our *nix agents > >> > behave as expected, but our Windows 2003 agents frequently show up as > >> > "disconnected" when checked with agent_control. However, the Windows > >> > agents ARE connected, and are sending events before and after > >> > agent_control shows them as disconnected. > > >> > It was our understanding that agent_control marks an agent as > >> > disconnected if that agent hasn't sent an event within the last 15 > >> > minutes (if this is incorrect, please tell me what it really is). Our > >> > systems tend to be very quiet for long periods of time, so we > >> > originally thought that the "disconnects" were happening simply > >> > because there were no events. > > >> > We implemented a heartbeat to fix this, which uses OSSEC's process > >> > monitoring tool to periodically send events. On Windows, the process > >> > monitor sends an event every 5-7 minutes. We expected that if OSSEC > >> > saw a heartbeat every 5-7 minutes, the 15-minute disconnect period in > >> > agent_control would never trigger. > > >> > However, even with the heartbeat, the Windows agents continue to show > >> > as "disconnected" for long periods of time - even if there was a > >> > heartbeat a few minutes prior to running the agent_control query. In > >> > addition, events continue to arrive at the manager despite the agent > >> > showing disconnected. > > >> > Below is the output of two agent_control commands, and two alerts from > >> > OSSEC's alerts.log which show heartbeats 3 minutes before the first > >> > agent_control and one minute after the second: > >> > _____________________________________________________________________________________ > >> > # date > >> > Mon Apr 11 18:10:51 GMT 2011 > >> > # ./agent_control -l > > >> > OSSEC HIDS agent_control. List of available agents: > >> > ID: 000, Name: manager (server), IP: 127.0.0.1, Active/Local > >> > <snip> > >> > ID: 009, Name: win-server1, IP: <IP>, Disconnected > > >> > # date > >> > Mon Apr 11 18:12:51 GMT 2011 > >> > # ./agent_control -l > > >> > OSSEC HIDS agent_control. List of available agents: > >> > ID: 000, Name: manager (server), IP: 127.0.0.1, Active/Local > >> > <snip> > >> > ID: 009, Name: win-server1, IP: <IP>, Disconnected > > >> > /opt/ossec/logs/alerts/alerts.log: > > >> > ** Alert 1302545271.2550465: - local,syslog, > >> > 2011 Apr 11 18:07:51 (win-server1) <IP>->"C\\Program Files\ossec-agent > >> > \win_heartbeat.bat" > >> > Rule: 101003 (level 1) -> 'OSSEC heartbeat: OSSEC Windows is running' > >> > Src IP: (none) > >> > User: (none) > >> > ossec: output: '"C\\Program Files\ossec-agent\win_heartbeat.bat"': Mon > >> > 04/11/2011 18:07 PM ossec-win-hb: Heartbeat: ossec- > >> > agent.exe 5852 0 4,652 K > > >> > <snip> > > >> > ** Alert 1302545632.2566243: - local,syslog, > >> > 2011 Apr 11 18:13:52 (win-server1) <IP>->"C\\Program Files\ossec-agent > >> > \win_heartbeat.bat" > >> > Rule: 101003 (level 1) -> 'OSSEC heartbeat: OSSEC Windows is running' > >> > Src IP: (none) > >> > User: (none) > >> > ossec: output: '"C\\Program Files\ossec-agent\win_heartbeat.bat"': Mon > >> > 04/11/2011 18:13 PM ossec-win-hb: Heartbeat: ossec- > >> > agent.exe 5852 0 4,644 K > >> > _____________________________________________________________________________________ > > >> > Although our current setup is OSSEC v2.4.1, we've been seeing this > >> > since OSSEC v1.6.1. It's definitely not that the Windows agents are > >> > disconnecting, as in the bug reports from a few years back, but that > >> > agent_control is marking them as disconnected when they are still > >> > connected and still sending events. We have several instances of this > >> > setup (dev, lab, test, etc) and it happens in all of them. > > >> > What is going on here? Is there some reason why the process monitor > >> > heartbeat isn't registering as an event to the manager? What else > >> > could be causing agent_control to display the Windows agents as > >> > disconnected when they're not? > > >> > Thanks! > >> > -Alisha Kloc
