I'm having the same exact problem with disconnecting Linux agents as well. I'm at the point that I'm running tcpdump to see what the issue is because it's baffling. We have around 30 registered agents and only around 5-10 are active at any given time (which seem to rotate in an out)...the rest of the time, logging in the client shows it's can't connect.
I am wondering if there is a UDP with the F5 that the packets are traversing. Here's an example from the agent: 2011/04/11 08:47:34 ossec-agentd: INFO: Event count after '20000': 2226760->2255200 (101%) <---What does this mean? 2011/04/11 08:55:47 ossec-agentd: WARN: Server unavailable. Setting lock. 2011/04/11 08:56:08 ossec-agentd(4101): WARN: Waiting for server reply (not started). Tried: 'xxx.xxx.xxx.54'. 2011/04/11 08:56:10 ossec-agentd: INFO: Trying to connect to server (xxx.xxx.xxx.54:1514). 2011/04/11 08:56:31 ossec-agentd(4101): WARN: Waiting for server reply (not started). Tried: 'xxx.xxx.xxx.54'. 2011/04/11 08:56:51 ossec-agentd: INFO: Trying to connect to server (xxx.xxx.xxx.54:1514). Kind Regards, Rob On Mon, Apr 11, 2011 at 12:22 PM, dan (ddp) <[email protected]> wrote: > It doesn't look like a very busy system. I'm not sure how else to > figure out what's going on. You could go high tech and use ktrace or > something to see where it's spending its time, or low tech by turning > on debugging (run it with -d). > Sorry I can't be much help with this one. > > On Mon, Mar 28, 2011 at 3:44 PM, Doug Burks <[email protected]> wrote: > > 41 agents total. > > > > Here are the stats from /var/ossec/stats/hourly-average: > > for i in *; do echo -n "$i "; cat $i; echo ""; done |sort -n > > 0 144467 > > 1 135681 > > 2 143439 > > 3 139292 > > 4 143869 > > 5 139974 > > 6 143945 > > 7 156203 > > 8 179020 > > 9 199613 > > 10 220229 > > 11 199679 > > 12 235240 > > 13 200294 > > 14 171326 > > 15 173679 > > 16 165433 > > 17 116530 > > 18 94434 > > 19 88046 > > 20 105235 > > 21 98339 > > 22 93802 > > 23 104293 > > 24 1124 > > > > Most of the alerts are Windows events coming from domain controllers. > > > > Thanks, > > -- > > Doug Burks, GSE, CISSP > > President, Greater Augusta ISSA > > http://augusta.issa.org > > http://securityonion.blogspot.com > > > > On Mon, Mar 28, 2011 at 3:25 PM, dan (ddp) <[email protected]> wrote: > >> How many agents? How many events per second? What kind of alerts are > >> you seeing most of? > >> > >> On Mon, Mar 14, 2011 at 5:17 PM, Doug Burks <[email protected]> > wrote: > >>> Agreed. Any ideas on how to find out why analysisd is at 99% cpu? :) > >>> > >>> Thanks, > >>> Doug Burks > >>> > >>> On Mon, Mar 14, 2011 at 3:04 PM, dan (ddp) <[email protected]> wrote: > >>>> I'd start by trying to find out why analysisd is at 99% cpu. > >>>> > >>>> On Fri, Mar 11, 2011 at 2:08 PM, Doug Burks <[email protected]> > wrote: > >>>>> Was there ever any conclusion on this problem? I have an OSSEC 2.5.1 > server > >>>>> with 43 agents. ossec-analysisd is using 99% CPU! Unix agents > periodically > >>>>> disconnect and will eventually reconnect. What can I do to > troubleshoot > >>>>> this further? > >>>>> Thanks, > >>>>> Doug Burks > >>>> > >>> > >> > > >
