Thanks, Daniel.  I have the trace but it is a 40 MB file.  How shall I
send it to you? - John

On Wed, 2009-03-18 at 17:10 -0300, Daniel Cid wrote:
> Hi John,
> 
> It seems to me that syscheck got stuck while trying to read a file
> (probably on the middle of a system call since
> you can't kill it). On a normal environment we check if the file is
> regular (and not a socket, device, etc) before we
> read it. However, on a virtual environment this may be failing.
> 
> Can you try the following? Before it starts to run out of control
> start strace to see what is happening. For example:
> 
> r...@ourhome:/root# ps auwx |grep syscheckd
> root     26897  1.4  0.0   2028   524 ?        R    17:05   0:02
> /var/ossec/bin/ossec-syscheckd
> 
> r...@ourhome:/root# strace -F -T -p 26897
> 
> It should let us know where it will get stuck after a while...
> 
> *you may also want to redirect the output to a file "strace -F -T -p
> pid > /tmp/log 2>&1"
> 
> Let us know the results and we will try to find out the issue.
> 
> Thanks,
> 
> --
> Daniel B. Cid
> dcid ( at ) ossec.net
> 
> On Wed, Mar 18, 2009 at 1:50 PM, John A. Sullivan III
> <[email protected]> wrote:
> >
> > Hello, all.  We are suddenly having a bit of a nightmare with our
> > otherwise usually delightful OSSEC.  We've installed it on a dual quad
> > core AMD server with 32GB of RAM running CentOS 5.2 but with kernel
> > 2.6.28.7 (the CentOS kernel panics with open-iscsi) and VServer
> > 2.3.0.36.7.
> >
> > After a while, a syscheckd process spins completely out of control
> > consuming 100% of one processor.  It refuses to die.  kill does not
> > work, kill -9 does not work, service ossec stop does not work.  Only
> > rebooting seems to work.  The console is flooded with:
> > BUG: soft lockup - CPU#3 stuck for 61s! [ossec-syscheckd:4625]
> >
> > The VServer host (the source of the runaway process) is an OSSEC agent.
> > Originally, the OSSEC server was running as one of its guests but we
> > thought that was the problem.  We moved the OSSEC server to another
> > piece of hardware yet the problem has persisted.
> >
> > We are using OSSEC http://www.ossec.net/files/ossec-hids-2.0.tar.gz
> > downloaded today.  Checksum matched.
> >
> > Here is the log since the last start.  Notice that it thinks syscheckd
> > has stopped:
> > 2009/03/18 11:55:43 ossec-execd: INFO: Started (pid: 4613).
> > 2009/03/18 11:55:43 ossec-agentd(1410): INFO: Reading authentication keys 
> > file.
> > 2009/03/18 11:55:43 ossec-agentd: INFO: No previous counter available for 
> > 'vserver'.
> > 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning counter for agent 
> > vserver01: '0:0'.
> > 2009/03/18 11:55:43 ossec-agentd: INFO: Assigning sender counter: 3:3930
> > 2009/03/18 11:55:43 ossec-agentd: INFO: Started (pid: 4617).
> > 2009/03/18 11:55:43 ossec-agentd: INFO: Server IP Address: 172.x.x.30
> > 2009/03/18 11:55:43 ossec-agentd: INFO: Trying to connect to server 
> > (172.x.x.30:1514).
> > 2009/03/18 11:55:44 ossec-agentd(4102): INFO: Connected to the server 
> > (172.x.x.30:1514).
> > 2009/03/18 11:55:47 ossec-syscheckd: INFO: Started (pid: 4625).
> > 2009/03/18 11:55:47 ossec-rootcheck: INFO: Started (pid: 4625).
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/var/log/messages'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/var/log/secure'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/var/log/maillog'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/var/log/cron'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/basevs/var/log/messages'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/h01/var/log/messages'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/ns02/var/log/messages'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/basevs/var/log/secure'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/h01/var/log/secure'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/ns02/var/log/secure'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/basevs/var/log/maillog'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/h01/var/log/maillog'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/ns02/var/log/maillog'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/basevs/var/log/cron'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/h01/var/log/cron'.
> > 2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: 
> > '/vservers/ns02/var/log/cron'.
> > 2009/03/18 11:55:49 ossec-logcollector: INFO: Started (pid: 4621).
> > 2009/03/18 11:58:28 ossec-syscheckd: Error opening directory: 
> > '/user/local/sbin': No such file or directory
> > 2009/03/18 11:59:13 ossec-syscheckd: Error opening directory: 
> > '/vservers/ns02/user/local/sbin': No such file or directory
> > 2009/03/18 12:01:13 ossec-syscheckd: INFO: Starting syscheck scan (db).
> > 2009/03/18 12:09:53 ossec-syscheckd: INFO: Ending syscheck scan (db).
> > 2009/03/18 12:10:13 ossec-rootcheck: INFO: Starting rootcheck scan.
> >
> >
> > Here is ossec.conf on the VServer host:
> > <ossec_config>
> >  <client>
> >    <server-ip>172.30.10.30</server-ip>
> >  </client>
> >
> >  <syscheck>
> >    <!-- Frequency that syscheck is executed - default to every 6 hours -->
> >    <frequency>21600</frequency>
> >    <alert_new_files>yes</alert_new_files>
> >
> >    <!-- Directories to check  (perform all possible verifications) -->
> >    <directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories>
> >    <directories 
> > check_all="yes">/bin,/sbin,/usr/local/bin,/user/local/sbin,/usr/local/etc</directories>
> >    <directories 
> > check_all="yes">/vservers/ns02/etc,/vservers/ns02/usr/bin,/vservers/ns02/usr/sbin</directories>
> >    <directories 
> > check_all="yes">/vservers/ns02/bin,/vservers/ns02/sbin,/vservers/ns02/usr/local/bin,/vservers/ns02/user/local/sbin,/vservers/ns02/usr/local/etc</directories>
> >
> >    <!-- Files/directories to ignore -->
> >    <ignore>/etc/mtab</ignore>
> >    <ignore>/etc/mnttab</ignore>
> >    <ignore>/etc/hosts.deny</ignore>
> >    <ignore>/etc/mail/statistics</ignore>
> >    <ignore>/etc/random-seed</ignore>
> >    <ignore>/etc/adjtime</ignore>
> >    <ignore>/etc/httpd/logs</ignore>
> >    <ignore>/etc/utmpx</ignore>
> >    <ignore>/etc/wtmpx</ignore>
> >    <ignore>/etc/cups/certs</ignore>
> >    <ignore>/etc/dumpdates</ignore>
> >    <ignore>/etc/svc/volatile</ignore>
> >
> >    <!-- Windows files to ignore -->
> >    <ignore>C:\WINDOWS/System32/LogFiles</ignore>
> >    <ignore>C:\WINDOWS/Debug</ignore>
> >    <ignore>C:\WINDOWS/WindowsUpdate.log</ignore>
> >    <ignore>C:\WINDOWS/iis6.log</ignore>
> >    <ignore>C:\WINDOWS/system32/wbem/Logs</ignore>
> >    <ignore>C:\WINDOWS/system32/wbem/Repository</ignore>
> >    <ignore>C:\WINDOWS/Prefetch</ignore>
> >    <ignore>C:\WINDOWS/PCHEALTH/HELPCTR/DataColl</ignore>
> >    <ignore>C:\WINDOWS/SoftwareDistribution</ignore>
> >    <ignore>C:\WINDOWS/Temp</ignore>
> >    <ignore>C:\WINDOWS/system32/config</ignore>
> >    <ignore>C:\WINDOWS/system32/spool</ignore>
> >    <ignore>C:\WINDOWS/system32/CatRoot</ignore>
> >  </syscheck>
> >
> >  <rootcheck>
> >    
> > <rootkit_files>/usr/local/ossec/etc/shared/rootkit_files.txt</rootkit_files>
> >    
> > <rootkit_trojans>/usr/local/ossec/etc/shared/rootkit_trojans.txt</rootkit_trojans>
> >    
> > <system_audit>/usr/local/ossec/etc/shared/system_audit_rcl.txt</system_audit>
> >    
> > <system_audit>/usr/local/ossec/etc/shared/cis_debian_linux_rcl.txt</system_audit>
> >    
> > <system_audit>/usr/local/ossec/etc/shared/cis_rhel_linux_rcl.txt</system_audit>
> >    
> > <system_audit>/usr/local/ossec/etc/shared/cis_rhel5_linux_rcl.txt</system_audit>
> >  </rootcheck>
> >  <!-- Files to monitor (localfiles) -->
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/var/log/messages</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/var/log/secure</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/var/log/maillog</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/var/log/cron</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/vservers/[a-zA-Z0-9]*/var/log/messages</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/vservers/[a-zA-Z0-9]*/var/log/secure</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location>
> >  </localfile>
> >
> >  <localfile>
> >    <log_format>syslog</log_format>
> >    <location>/vservers/[a-zA-Z0-9]*/var/log/cron</location>
> >  </localfile>
> > </ossec_config>
> >
> > Any idea what is causing this? How to kill the process without
> > rebooting? How to fix it?
> >
> > We're starting to fall behind on this critical project so any help is
> > greatly appreciated.  Thanks - John
> >
> > --
> > John A. Sullivan III
> > Open Source Development Corporation
> > +1 207-985-7880
> > [email protected]
> >
> > http://www.spiritualoutreach.com
> > Making Christianity intelligible to secular society
> >
> >
-- 
John A. Sullivan III
Open Source Development Corporation
+1 207-985-7880
[email protected]

http://www.spiritualoutreach.com
Making Christianity intelligible to secular society

Reply via email to