Which versions of OSSEC were running on the server and on agents? 

On Monday, January 14, 2013 4:51:38 PM UTC-8, Tony Trummer wrote:
>
> Just to add to the numerous "agent disconnection" issues reported with 
> OSSEC.
>
> I have approximately 250 Centos agents that speak to a server and all have 
> no issues connecting if both sides are restarted, but after a few hours 
> they all begin to die off and when I come in the next morning some random 
> amount will have disconnected (more than just a couple and not necessarily 
> the same ones). If I restart the processes, they will ALL eventually 
> reconnect, without exception.
>
> I have run tcpdump on both sides and verified communication exists in both 
> directions.
>
> In most cases, it appear that the client sends a message, the server 
> receives it, but never responds. 
>
> I don't see anything in /var/ossec/logs/ossec.log and I've used strace on 
> the remoted and monitord processes, but thus far have not been able to 
> narrow down the issue.
>
> In one instance, I watched 3 successive failures from the agent's 
> perspective followed by the agent connecting in strace and while there are 
> differences, I can't see the relevance.
>
> Am I looking in the right place?
>
> Here's an example of a failed client request on the server side trace 
> (abbreviated).
>
> 1. recvfrom(4, ...) - I believe fd 4 is the network socket
> 2. stat("/queue/ossec/.wait...) -1 ENOENT ...
> 3. sendto(5, "1:(hostname.foo.bar.com) 1"... - I believe fd 5 is remoted 
> process?
>
> Here's what the successful attempt appear to look like:
> 1. recvfrom(4,...)
> 2. time(NULL)
> 3.time(NULL)
> 4.brk(some hex)
> 5.brk(some hex)
> 6.brk(some hex)
> 7.brk(some different hex)
> 8.write(222, "48:3720:", 8) - my guess is this is either 
> /queue/rids/<agent_key> or /queue/rids/sender_counter
> 9.lseek(222,0, SEEK_SET)
> 10.sendto(4,...)
>
> The replay protection check has been disabled, because I suspected it as a 
> culprit. 
>
> Any ideas?
>

Reply via email to