Hi John, It seems to me that syscheck got stuck while trying to read a file (probably on the middle of a system call since you can't kill it). On a normal environment we check if the file is regular (and not a socket, device, etc) before we read it. However, on a virtual environment this may be failing.
Can you try the following? Before it starts to run out of control start strace to see what is happening. For example: r...@ourhome:/root# ps auwx |grep syscheckd root 26897 1.4 0.0 2028 524 ? R 17:05 0:02 /var/ossec/bin/ossec-syscheckd r...@ourhome:/root# strace -F -T -p 26897 It should let us know where it will get stuck after a while... *you may also want to redirect the output to a file "strace -F -T -p pid > /tmp/log 2>&1" Let us know the results and we will try to find out the issue. Thanks, -- Daniel B. Cid dcid ( at ) ossec.net On Wed, Mar 18, 2009 at 1:50 PM, John A. Sullivan III <[email protected]> wrote: > > Hello, all. We are suddenly having a bit of a nightmare with our > otherwise usually delightful OSSEC. We've installed it on a dual quad > core AMD server with 32GB of RAM running CentOS 5.2 but with kernel > 2.6.28.7 (the CentOS kernel panics with open-iscsi) and VServer > 2.3.0.36.7. > > After a while, a syscheckd process spins completely out of control > consuming 100% of one processor. It refuses to die. kill does not > work, kill -9 does not work, service ossec stop does not work. Only > rebooting seems to work. The console is flooded with: > BUG: soft lockup - CPU#3 stuck for 61s! [ossec-syscheckd:4625] > > The VServer host (the source of the runaway process) is an OSSEC agent. > Originally, the OSSEC server was running as one of its guests but we > thought that was the problem. We moved the OSSEC server to another > piece of hardware yet the problem has persisted. > > We are using OSSEC http://www.ossec.net/files/ossec-hids-2.0.tar.gz > downloaded today. Checksum matched. > > Here is the log since the last start. Notice that it thinks syscheckd > has stopped: > 2009/03/18 11:55:43 ossec-execd: INFO: Started (pid: 4613). > 2009/03/18 11:55:43 ossec-agentd(1410): INFO: Reading authentication keys > file. > 2009/03/18 11:55:43 ossec-agentd: INFO: No previous counter available for > 'vserver'. > 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning counter for agent > vserver01: '0:0'. > 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning sender counter: 3:3930 > 2009/03/18 11:55:43 ossec-agentd: INFO: Started (pid: 4617). > 2009/03/18 11:55:43 ossec-agentd: INFO: Server IP Address: 172.x.x.30 > 2009/03/18 11:55:43 ossec-agentd: INFO: Trying to connect to server > (172.x.x.30:1514). > 2009/03/18 11:55:44 ossec-agentd(4102): INFO: Connected to the server > (172.x.x.30:1514). > 2009/03/18 11:55:47 ossec-syscheckd: INFO: Started (pid: 4625). > 2009/03/18 11:55:47 ossec-rootcheck: INFO: Started (pid: 4625). > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/var/log/messages'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/var/log/secure'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/var/log/maillog'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/var/log/cron'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/basevs/var/log/messages'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/h01/var/log/messages'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/ns02/var/log/messages'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/basevs/var/log/secure'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/h01/var/log/secure'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/ns02/var/log/secure'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/basevs/var/log/maillog'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/h01/var/log/maillog'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/ns02/var/log/maillog'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/basevs/var/log/cron'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/h01/var/log/cron'. > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: > '/vservers/ns02/var/log/cron'. > 2009/03/18 11:55:49 ossec-logcollector: INFO: Started (pid: 4621). > 2009/03/18 11:58:28 ossec-syscheckd: Error opening directory: > '/user/local/sbin': No such file or directory > 2009/03/18 11:59:13 ossec-syscheckd: Error opening directory: > '/vservers/ns02/user/local/sbin': No such file or directory > 2009/03/18 12:01:13 ossec-syscheckd: INFO: Starting syscheck scan (db). > 2009/03/18 12:09:53 ossec-syscheckd: INFO: Ending syscheck scan (db). > 2009/03/18 12:10:13 ossec-rootcheck: INFO: Starting rootcheck scan. > > > Here is ossec.conf on the VServer host: > <ossec_config> > <client> > <server-ip>172.30.10.30</server-ip> > </client> > > <syscheck> > <!-- Frequency that syscheck is executed - default to every 6 hours --> > <frequency>21600</frequency> > <alert_new_files>yes</alert_new_files> > > <!-- Directories to check (perform all possible verifications) --> > <directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories> > <directories > check_all="yes">/bin,/sbin,/usr/local/bin,/user/local/sbin,/usr/local/etc</directories> > <directories > check_all="yes">/vservers/ns02/etc,/vservers/ns02/usr/bin,/vservers/ns02/usr/sbin</directories> > <directories > check_all="yes">/vservers/ns02/bin,/vservers/ns02/sbin,/vservers/ns02/usr/local/bin,/vservers/ns02/user/local/sbin,/vservers/ns02/usr/local/etc</directories> > > <!-- Files/directories to ignore --> > <ignore>/etc/mtab</ignore> > <ignore>/etc/mnttab</ignore> > <ignore>/etc/hosts.deny</ignore> > <ignore>/etc/mail/statistics</ignore> > <ignore>/etc/random-seed</ignore> > <ignore>/etc/adjtime</ignore> > <ignore>/etc/httpd/logs</ignore> > <ignore>/etc/utmpx</ignore> > <ignore>/etc/wtmpx</ignore> > <ignore>/etc/cups/certs</ignore> > <ignore>/etc/dumpdates</ignore> > <ignore>/etc/svc/volatile</ignore> > > <!-- Windows files to ignore --> > <ignore>C:\WINDOWS/System32/LogFiles</ignore> > <ignore>C:\WINDOWS/Debug</ignore> > <ignore>C:\WINDOWS/WindowsUpdate.log</ignore> > <ignore>C:\WINDOWS/iis6.log</ignore> > <ignore>C:\WINDOWS/system32/wbem/Logs</ignore> > <ignore>C:\WINDOWS/system32/wbem/Repository</ignore> > <ignore>C:\WINDOWS/Prefetch</ignore> > <ignore>C:\WINDOWS/PCHEALTH/HELPCTR/DataColl</ignore> > <ignore>C:\WINDOWS/SoftwareDistribution</ignore> > <ignore>C:\WINDOWS/Temp</ignore> > <ignore>C:\WINDOWS/system32/config</ignore> > <ignore>C:\WINDOWS/system32/spool</ignore> > <ignore>C:\WINDOWS/system32/CatRoot</ignore> > </syscheck> > > <rootcheck> > > <rootkit_files>/usr/local/ossec/etc/shared/rootkit_files.txt</rootkit_files> > > <rootkit_trojans>/usr/local/ossec/etc/shared/rootkit_trojans.txt</rootkit_trojans> > > <system_audit>/usr/local/ossec/etc/shared/system_audit_rcl.txt</system_audit> > > <system_audit>/usr/local/ossec/etc/shared/cis_debian_linux_rcl.txt</system_audit> > > <system_audit>/usr/local/ossec/etc/shared/cis_rhel_linux_rcl.txt</system_audit> > > <system_audit>/usr/local/ossec/etc/shared/cis_rhel5_linux_rcl.txt</system_audit> > </rootcheck> > <!-- Files to monitor (localfiles) --> > > <localfile> > <log_format>syslog</log_format> > <location>/var/log/messages</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/var/log/secure</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/var/log/maillog</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/var/log/cron</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/vservers/[a-zA-Z0-9]*/var/log/messages</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/vservers/[a-zA-Z0-9]*/var/log/secure</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location> > </localfile> > > <localfile> > <log_format>syslog</log_format> > <location>/vservers/[a-zA-Z0-9]*/var/log/cron</location> > </localfile> > </ossec_config> > > Any idea what is causing this? How to kill the process without > rebooting? How to fix it? > > We're starting to fall behind on this critical project so any help is > greatly appreciated. Thanks - John > > -- > John A. Sullivan III > Open Source Development Corporation > +1 207-985-7880 > [email protected] > > http://www.spiritualoutreach.com > Making Christianity intelligible to secular society > >
