These were messages on the server around mid-night, but I know space was not an issue as we have extensive monitoring on the server via nagios and cacti, both showing du was around 50% just as it is now...
2010/01/01 00:00:00 alerts(1107): ERROR: Unable to create directory: '/logs/archives/2010/' 2010/01/01 00:00:00 ossec-remoted: socketerr (not available). 2010/01/01 00:00:00 ossec-remoted(1210): ERROR: Queue '/queue/ossec/queue' not accessible: 'Connection refused'. 2010/01/01 00:00:03 ossec-remoted(1210): ERROR: Queue '/queue/ossec/queue' not accessible: 'Connection refused'. 2010/01/01 00:00:03 ossec-remoted(1211): ERROR: Unable to access queue: '/queue/ossec/queue'. Giving up.. 2010/01/01 00:00:03 ossec-logcollector: socketerr (not available). 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to queue. 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to queue. 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to queue. 2010/01/01 00:01:21 ossec-monitord: socketerr (not available). 2010/01/01 00:01:21 ossec-monitord(1224): ERROR: Error sending message to queue. 2010/01/01 00:01:43 ossec-monitord: Compression error: /logs/alerts/2009/Dec/ossec-alerts-31.log.gz: No space left on device 2010/01/01 00:01:43 ossec-monitord: Compression error: /logs/alerts/2009/Dec/ossec-alerts-31.log.gz: No space left on device 2010/01/01 00:04:23 ossec-logcollector: socketerr (not available). 2010/01/01 00:05:50 ossec-monitord: socketerr (not available). 2010/01/01 00:05:50 ossec-monitord(1224): ERROR: Error sending message to queue. 2010/01/01 00:06:33 ossec-logcollector: socketerr (not available). 2010/01/01 00:07:50 ossec-monitord: socketerr (not available). 2010/01/01 00:08:43 ossec-logcollector: socketerr (not available). 2010/01/01 00:10:53 ossec-logcollector: socketerr (not available). 2010/01/01 00:13:03 ossec-logcollector: socketerr (not available). 2010/01/01 01:31:52 ossec-syscheckd: socketerr (not available). 2010/01/01 01:31:52 ossec-syscheckd(1224): ERROR: Error sending message to queue. 2010/01/01 01:31:55 ossec-syscheckd(1210): ERROR: Queue '/var/ossec/queue/ossec/queue' not accessible: 'Connection refused'. 2010/01/01 01:31:55 ossec-syscheckd(1211): ERROR: Unable to access queue: '/var/ossec/queue/ossec/queue'. Giving up.. 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file '/logs/archives/2010/Jan/ossec-archive-01.log.sum'. 2010/01/02 00:02:01 ossec-monitord: File '/logs/alerts/2010/Jan/ossec-alerts-01.log' not found. MD5 checksum skipped. 2010/01/02 00:02:01 ossec-monitord: File '/logs/alerts/2010/Jan/ossec-alerts-01.log' not found. SHA1 checksum skipped. 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file '/logs/alerts/2010/Jan/ossec-alerts-01.log.sum'. 2010/01/02 00:02:01 ossec-monitord(1103): ERROR: Unable to open file '/logs/firewall/2010/Jan/ossec-firewall-01.log.sum'. I restarted the ossec service on the 4th, and from then on, all the logs have been created for 2010/Jan. The odd portion is the emails only send the disconnects, and never the reconnects unless I am actually restarting from the windows gui on the hosts. Newly added hosts, as of yesterday are "immune" and are not being disconnected so far (10-new hosts running 2.3). On Fri, Jan 8, 2010 at 1:56 PM, Daniel Cid <[email protected]> wrote: > Hi Rich, > > Strange issue for sure... I have not heard of anyone else having problems > during > the 2010 change and I don't think we have anything in the code that > would be affected > by the year change. > > What is showing up on the manager's log? Can you check if it has any > partition full or > anything like that? For all the agents to be affected, I would guess > an issue on the > manager side... > > Thanks, > > -- > Daniel B. Cid > dcid ( at ) ossec.net > > > > > > On Thu, Jan 7, 2010 at 11:54 AM, Rich Rumble <[email protected]> wrote: >> Still on going issues here... I'm getting emails every 2 minutes >> listing >> all the host's disconnecting from the server. They seem to be going in >> order of their number, the last 160 disconnects first, then 159 etc.. >> all >> the way down to 001, then all host's start to come back in the >> opposite >> direction, 001, then 002 etc... The server has been rebooted, I cannot >> reboot >> the clients, but their services have been restarted, keys copied >> over... >> When they are "connected" after the whole cycle they never email about >> being connected, their status just shows connected: >> /var/ossec/bin/agent_control -lc shows the connected boxes >> ... Again the server is running 2.3 code and most clients are 2.2 >> No changes for the past 6 months other than upgrading to 2.3 on the >> server a few weeks ago. The clients are showing the issues I stated >> previously, again >> all of their trouble began at the roll over to 2010. >> >>> 2010/01/02 01:43:08 ossec-agent: Error waiting mutex (timeout). >>> 2010/01/02 01:44:53 ossec-agent: Error waiting mutex (timeout). >>> 2010/01/02 01:46:38 ossec-agent: Error waiting mutex (timeout). >>> 2010/01/02 01:48:23 ossec-agent: Error waiting mutex (timeout). >>> 2010/01/02 01:50:08 ossec-agent: Error waiting mutex (timeout). >>> 2010/01/02 01:50:17 ossec-agent: INFO: Trying to connect to server >>> (10.2.2.6:1514). >>> and on and on. >>> I've restarted the server, restarted the agents to no avail. Some are >>> now reporting duplicate counter >>> errors, and deleting the rids files is not fixing them this time >>> around. >>> The server is 2.3 and most agents are 2.2 windows only. >> >
