Hello, all. We are suddenly having a bit of a nightmare with our otherwise usually delightful OSSEC. We've installed it on a dual quad core AMD server with 32GB of RAM running CentOS 5.2 but with kernel 2.6.28.7 (the CentOS kernel panics with open-iscsi) and VServer 2.3.0.36.7.
After a while, a syscheckd process spins completely out of control consuming 100% of one processor. It refuses to die. kill does not work, kill -9 does not work, service ossec stop does not work. Only rebooting seems to work. The console is flooded with: BUG: soft lockup - CPU#3 stuck for 61s! [ossec-syscheckd:4625] The VServer host (the source of the runaway process) is an OSSEC agent. Originally, the OSSEC server was running as one of its guests but we thought that was the problem. We moved the OSSEC server to another piece of hardware yet the problem has persisted. We are using OSSEC http://www.ossec.net/files/ossec-hids-2.0.tar.gz downloaded today. Checksum matched. Here is the log since the last start. Notice that it thinks syscheckd has stopped: 2009/03/18 11:55:43 ossec-execd: INFO: Started (pid: 4613). 2009/03/18 11:55:43 ossec-agentd(1410): INFO: Reading authentication keys file. 2009/03/18 11:55:43 ossec-agentd: INFO: No previous counter available for 'vserver'. 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning counter for agent vserver01: '0:0'. 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning sender counter: 3:3930 2009/03/18 11:55:43 ossec-agentd: INFO: Started (pid: 4617). 2009/03/18 11:55:43 ossec-agentd: INFO: Server IP Address: 172.x.x.30 2009/03/18 11:55:43 ossec-agentd: INFO: Trying to connect to server (172.x.x.30:1514). 2009/03/18 11:55:44 ossec-agentd(4102): INFO: Connected to the server (172.x.x.30:1514). 2009/03/18 11:55:47 ossec-syscheckd: INFO: Started (pid: 4625). 2009/03/18 11:55:47 ossec-rootcheck: INFO: Started (pid: 4625). 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/messages'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/secure'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/maillog'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/cron'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/messages'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/messages'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/messages'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/secure'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/secure'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/secure'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/maillog'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/maillog'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/maillog'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/cron'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/cron'. 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/cron'. 2009/03/18 11:55:49 ossec-logcollector: INFO: Started (pid: 4621). 2009/03/18 11:58:28 ossec-syscheckd: Error opening directory: '/user/local/sbin': No such file or directory 2009/03/18 11:59:13 ossec-syscheckd: Error opening directory: '/vservers/ns02/user/local/sbin': No such file or directory 2009/03/18 12:01:13 ossec-syscheckd: INFO: Starting syscheck scan (db). 2009/03/18 12:09:53 ossec-syscheckd: INFO: Ending syscheck scan (db). 2009/03/18 12:10:13 ossec-rootcheck: INFO: Starting rootcheck scan. Here is ossec.conf on the VServer host: <ossec_config> <client> <server-ip>172.30.10.30</server-ip> </client> <syscheck> <!-- Frequency that syscheck is executed - default to every 6 hours --> <frequency>21600</frequency> <alert_new_files>yes</alert_new_files> <!-- Directories to check (perform all possible verifications) --> <directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories> <directories check_all="yes">/bin,/sbin,/usr/local/bin,/user/local/sbin,/usr/local/etc</directories> <directories check_all="yes">/vservers/ns02/etc,/vservers/ns02/usr/bin,/vservers/ns02/usr/sbin</directories> <directories check_all="yes">/vservers/ns02/bin,/vservers/ns02/sbin,/vservers/ns02/usr/local/bin,/vservers/ns02/user/local/sbin,/vservers/ns02/usr/local/etc</directories> <!-- Files/directories to ignore --> <ignore>/etc/mtab</ignore> <ignore>/etc/mnttab</ignore> <ignore>/etc/hosts.deny</ignore> <ignore>/etc/mail/statistics</ignore> <ignore>/etc/random-seed</ignore> <ignore>/etc/adjtime</ignore> <ignore>/etc/httpd/logs</ignore> <ignore>/etc/utmpx</ignore> <ignore>/etc/wtmpx</ignore> <ignore>/etc/cups/certs</ignore> <ignore>/etc/dumpdates</ignore> <ignore>/etc/svc/volatile</ignore> <!-- Windows files to ignore --> <ignore>C:\WINDOWS/System32/LogFiles</ignore> <ignore>C:\WINDOWS/Debug</ignore> <ignore>C:\WINDOWS/WindowsUpdate.log</ignore> <ignore>C:\WINDOWS/iis6.log</ignore> <ignore>C:\WINDOWS/system32/wbem/Logs</ignore> <ignore>C:\WINDOWS/system32/wbem/Repository</ignore> <ignore>C:\WINDOWS/Prefetch</ignore> <ignore>C:\WINDOWS/PCHEALTH/HELPCTR/DataColl</ignore> <ignore>C:\WINDOWS/SoftwareDistribution</ignore> <ignore>C:\WINDOWS/Temp</ignore> <ignore>C:\WINDOWS/system32/config</ignore> <ignore>C:\WINDOWS/system32/spool</ignore> <ignore>C:\WINDOWS/system32/CatRoot</ignore> </syscheck> <rootcheck> <rootkit_files>/usr/local/ossec/etc/shared/rootkit_files.txt</rootkit_files> <rootkit_trojans>/usr/local/ossec/etc/shared/rootkit_trojans.txt</rootkit_trojans> <system_audit>/usr/local/ossec/etc/shared/system_audit_rcl.txt</system_audit> <system_audit>/usr/local/ossec/etc/shared/cis_debian_linux_rcl.txt</system_audit> <system_audit>/usr/local/ossec/etc/shared/cis_rhel_linux_rcl.txt</system_audit> <system_audit>/usr/local/ossec/etc/shared/cis_rhel5_linux_rcl.txt</system_audit> </rootcheck> <!-- Files to monitor (localfiles) --> <localfile> <log_format>syslog</log_format> <location>/var/log/messages</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/var/log/secure</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/var/log/maillog</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/var/log/cron</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/vservers/[a-zA-Z0-9]*/var/log/messages</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/vservers/[a-zA-Z0-9]*/var/log/secure</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location> </localfile> <localfile> <log_format>syslog</log_format> <location>/vservers/[a-zA-Z0-9]*/var/log/cron</location> </localfile> </ossec_config> Any idea what is causing this? How to kill the process without rebooting? How to fix it? We're starting to fall behind on this critical project so any help is greatly appreciated. Thanks - John -- John A. Sullivan III Open Source Development Corporation +1 207-985-7880 [email protected] http://www.spiritualoutreach.com Making Christianity intelligible to secular society
