Hi Rich, Looking at your logs, it seems that your permissions are wrong (or a hardware problem) because OSSEC was not able to create the proper directories and its log files:
2010/01/01 00:00:00 alerts(1107): ERROR: Unable to create directory: '/logs/archives/2010/' Because of that, analysisd probably crashed causing all other processes to stop working... Try re-running the install.sh and updating to version 2.3 to see if it fixes the problem. Thanks, -- Daniel B. Cid dcid ( at ) ossec.net On Fri, Jan 8, 2010 at 4:11 PM, Rich Rumble <[email protected]> wrote: > These were messages on the server around mid-night, but I know space was not > an issue as we have extensive monitoring on the server via nagios and > cacti, both > showing du was around 50% just as it is now... > > 2010/01/01 00:00:00 alerts(1107): ERROR: Unable to create directory: > '/logs/archives/2010/' > 2010/01/01 00:00:00 ossec-remoted: socketerr (not available). > 2010/01/01 00:00:00 ossec-remoted(1210): ERROR: Queue > '/queue/ossec/queue' not accessible: 'Connection refused'. > 2010/01/01 00:00:03 ossec-remoted(1210): ERROR: Queue > '/queue/ossec/queue' not accessible: 'Connection refused'. > 2010/01/01 00:00:03 ossec-remoted(1211): ERROR: Unable to access > queue: '/queue/ossec/queue'. Giving up.. > 2010/01/01 00:00:03 ossec-logcollector: socketerr (not available). > 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). > 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to > queue. > 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). > 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to > queue. > 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). > 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to > queue. > 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). > 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to > queue. > 2010/01/01 00:01:43 ossec-monitord: Compression error: > /logs/alerts/2009/Dec/ossec-alerts-31.log.gz: No space left on device > 2010/01/01 00:01:43 ossec-monitord: Compression error: > /logs/alerts/2009/Dec/ossec-alerts-31.log.gz: No space left on device > 2010/01/01 00:04:23 ossec-logcollector: socketerr (not available). > 2010/01/01 00:05:50 ossec-monitord: socketerr (not available). > 2010/01/01 00:05:50 ossec-monitord(1224): ERROR: Error sending message to > queue. > 2010/01/01 00:06:33 ossec-logcollector: socketerr (not available). > 2010/01/01 00:07:50 ossec-monitord: socketerr (not available). > 2010/01/01 00:08:43 ossec-logcollector: socketerr (not available). > 2010/01/01 00:10:53 ossec-logcollector: socketerr (not available). > 2010/01/01 00:13:03 ossec-logcollector: socketerr (not available). > 2010/01/01 01:31:52 ossec-syscheckd: socketerr (not available). > 2010/01/01 01:31:52 ossec-syscheckd(1224): ERROR: Error sending > message to queue. > 2010/01/01 01:31:55 ossec-syscheckd(1210): ERROR: Queue > '/var/ossec/queue/ossec/queue' not accessible: 'Connection refused'. > 2010/01/01 01:31:55 ossec-syscheckd(1211): ERROR: Unable to access > queue: '/var/ossec/queue/ossec/queue'. Giving up.. > 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file > '/logs/archives/2010/Jan/ossec-archive-01.log.sum'. > 2010/01/02 00:02:01 ossec-monitord: File > '/logs/alerts/2010/Jan/ossec-alerts-01.log' not found. MD5 checksum > skipped. > 2010/01/02 00:02:01 ossec-monitord: File > '/logs/alerts/2010/Jan/ossec-alerts-01.log' not found. SHA1 checksum > skipped. > 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file > '/logs/alerts/2010/Jan/ossec-alerts-01.log.sum'. > 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file > '/logs/firewall/2010/Jan/ossec-firewall-01.log.sum'. > > I restarted the ossec service on the 4th, and from then on, all the > logs have been > created for 2010/Jan. > > The odd portion is the emails only send the disconnects, and never the > reconnects unless I am actually restarting from the windows gui on the hosts. > Newly added hosts, as of yesterday are "immune" and are not being disconnected > so far (10-new hosts running 2.3). > > On Fri, Jan 8, 2010 at 1:56 PM, Daniel Cid <[email protected]> wrote: >> Hi Rich, >> >> Strange issue for sure... I have not heard of anyone else having problems >> during >> the 2010 change and I don't think we have anything in the code that >> would be affected >> by the year change. >> >> What is showing up on the manager's log? Can you check if it has any >> partition full or >> anything like that? For all the agents to be affected, I would guess >> an issue on the >> manager side... >> >> Thanks, >> >> -- >> Daniel B. Cid >> dcid ( at ) ossec.net >> >> >> >> >> >> On Thu, Jan 7, 2010 at 11:54 AM, Rich Rumble <[email protected]> wrote: >>> Still on going issues here... I'm getting emails every 2 minutes >>> listing >>> all the host's disconnecting from the server. They seem to be going in >>> order of their number, the last 160 disconnects first, then 159 etc.. >>> all >>> the way down to 001, then all host's start to come back in the >>> opposite >>> direction, 001, then 002 etc... The server has been rebooted, I cannot >>> reboot >>> the clients, but their services have been restarted, keys copied >>> over... >>> When they are "connected" after the whole cycle they never email about >>> being connected, their status just shows connected: >>> /var/ossec/bin/agent_control -lc shows the connected boxes >>> ... Again the server is running 2.3 code and most clients are 2.2 >>> No changes for the past 6 months other than upgrading to 2.3 on the >>> server a few weeks ago. The clients are showing the issues I stated >>> previously, again >>> all of their trouble began at the roll over to 2010. >>> >>>> 2010/01/02 01:43:08 ossec-agent: Error waiting mutex (timeout). >>>> 2010/01/02 01:44:53 ossec-agent: Error waiting mutex (timeout). >>>> 2010/01/02 01:46:38 ossec-agent: Error waiting mutex (timeout). >>>> 2010/01/02 01:48:23 ossec-agent: Error waiting mutex (timeout). >>>> 2010/01/02 01:50:08 ossec-agent: Error waiting mutex (timeout). >>>> 2010/01/02 01:50:17 ossec-agent: INFO: Trying to connect to server >>>> (10.2.2.6:1514). >>>> and on and on. >>>> I've restarted the server, restarted the agents to no avail. Some are >>>> now reporting duplicate counter >>>> errors, and deleting the rids files is not fixing them this time >>>> around. >>>> The server is 2.3 and most agents are 2.2 windows only. >>> >> >
