Hello, I just wanted to share about my experience about disconnected agents status in agent_control. I have a setup with 200+ agents deployed, every agent were in a connected state until 2 days ago. I have some windows agents but most of them are Unix (RedHat/CentOS/AIX). I have several VLAN, some are protected by firewall others not, etc. I don't use SELINUX, and iptables always worked before. We didn't do any massive change in the infrastructure.
I have a zabbix graphs keeping an eye on all OSSEC connected agents, it gives us a history about all deployed agent. Everything was fine until 2 days ago, it dropped from 169 to 15. After running agent_control –l on the OSSEC server, I discovered most of the agents were marked as disconnected. I discovered some of the sysadmin people have been cloning machines and so, OSSEC keys across multiple machines. It created issues with the RIDS file (I have a OSSEC rule warning me about such issue, and I got way too many emails to ignore the problem). I found the duplicate keys and fix them but starting to work on the the disconnected agents. So, after spending 2 days trying to figure why most of my agents were disconnected, reading any piece of info about it, I finally found the root cause. It is related to the rids files under queue/rids directory. So, I cleaned them up, on each side (server and clients), and everything is back to normal. I thought I will share the story in case anybody is in the same situation. I didn't want to move my server, I didn't want to upgrade it (well, I'm already running 2.6). -StephaneR
