Hi David,
Yes. The 'H' was turned on for the TOP info.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.99 0.008990 5 1858 929 futex
33.34 0.005996 857 7 select
16.67 0.002997 20 151 read
0.00 0.000000 0 542 recvfrom
0.00 0.000000 0 7 recvmsg
0.00 0.000000 0 468 gettimeofday
0.00 0.000000 0 1860 clock_gettime
------ ----------- ----------- --------- --------- ----------------
100.00 0.017983 4893 929 total
As seen from the above TOP info,there are 929 errors for futex
call.Sometimes there were errors for recvmsg call where I did more
straces.Does this relate to the CPU high load?
I'll do more tests as your suggestion.
Thanks,
Henry
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Tuesday, May 10, 2011 8:29 PM
To: rsyslog-users
Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to Oracle
thanks for the info.
when you printed the TOP info, was that with 'H' (thread details) turned on?
I'm puzzled by the fact that you are doing a strace of the thread 13470 and
it's showing actions by many different pids (it looks like everything except
13470
can you try doing
strace -ff -o rsyslog.strace. -C -ttt -T -s 100 rsyslogd -c5 -nd
>rsyslog.debug
this will create a bunch of strace files. we shouldn't need to look at all
of them, just the one using all the CPU (and possibly only the end of that
one)
while this is running, do a ps -eLf
also, run top -H and note which thread is using all the CPU (you may want to
try toggling H while it's running to make sure you are getting the thread
info not the consolodated process info)
I'm about to head home and when I get home I'll look at the debug output
that you send.
one thing I'm not remembering. are the logs getting into the database?
David Lang
On Tue, 10 May 2011, bxu wrote:
> Hi David,
> Here are what I'm doing:
> 1. oracle configuration of rsyslogd.conf
>
> ################
> #### ORACLE ####
> ################
>
> $ModLoad omoracle
> $OmoracleDBUser syslog
> $OmoracleDBPassword syslog
> $OmoracleDB syslog
> $OmoracleBatchSize 1
> $OmoracleBatchItemSize 4096
> $OmoracleStatementTemplate OmoracleStatement
>
> $template OmoracleStatement,"INSERT INTO TEST(Hostname,Message)
> VALUES(:hostname,:msg)"
> $template TestStmt,"%hostname%%msg%"
> $template TestStmtFile,"INSERT INTO TEST(Hostname,Message)
> VALUES('%hostname%','%msg%')\n"
>
> #$WorkDirectory /rsyslog/work # default location for work (spool)
> files
>
> #$ActionQueueType LinkedList # use asynchronous processing
> #$ActionQueueFileName dbq # set file name, also enables disk mode
> #$ActionResumeRetryCount -1 # infinite retries on insert failure
>
> :msg,contains,"SIP" :omoracle:;TestStmt
>
> & ~
>
> *.* /var/log/syslog-test;TestStmtFile
>
> 2. running command : rsyslogd -c5 -nd > tt &
>
> 3, TOP information
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>
> 13470 root 25 0 244m 11m 7636 R 99.9 0.1 1:06.39 rs:main Q:Reg
>
> 11410 root 15 0 12768 1144 832 R 0.3 0.0 0:13.52 top
>
> 1 root 15 0 10372 692 580 S 0.0 0.0 0:00.51 init
>
> 4. when CPU usage is keeping at 100%,the summary of strace by thread
> 13470:
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 51.26 0.019996 9998 2 epoll_wait
> 46.12 0.017990 1799 10 read
> 2.56 0.000999 999 1 select
> 0.06 0.000022 2 12 2 recvfrom
> 0.00 0.000000 0 64 write
> 0.00 0.000000 0 1 recvmsg
> 0.00 0.000000 0 17 gettimeofday
> 0.00 0.000000 0 62 clock_gettime
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.039007 169 2 total
>
> 5. when CPU usage is keeping at 100%,some of the call sequence of
> strace by thread 13470:
>
> [pid 13469] 0.000000 select(1, NULL, NULL, NULL, {86097, 801000}
> <unfinished ...>
> [pid 13473] 0.000089 clock_gettime(CLOCK_REALTIME, <unfinished ...>
> [pid 13472] 0.000054 select(9, [8], NULL, NULL, NULL <unfinished ...>
> [pid 13471] 0.000066 read(3, <unfinished ...>
> [pid 13473] 0.000048 <... clock_gettime resumed> {1305046294,
> 274947000}) = 0 <0.000138>
> [pid 13473] 0.000067 write(1, "6294.274947000:43899940: imudp: "...,
> 66) = 66 <0.000042>
> [pid 13473] 0.000137 epoll_wait(12, <unfinished ...>
> [pid 13471] 1.421497 <... read resumed> "<6>iptables: log and drop
> IN=eth"..., 4095) = 200 <1.421706>
> [pid 13471] 0.000095 gettimeofday({1305046295, 696914}, NULL) = 0
> <0.000022>
> [pid 13471] 0.000130 clock_gettime(CLOCK_REALTIME, {1305046295,
> 697046000}) = 0 <0.000022>
> [pid 13471] 0.000104 write(1, "6295.697046000:42497940: main Q:"...,
> 81) = 81 <0.000037>
> [pid 13471] 0.000127 clock_gettime(CLOCK_REALTIME, {1305046295,
> 697276000}) = 0 <0.000020>
> [pid 13471] 0.000102 write(1, "6295.697276000:42497940: main Q:"...,
> 65) = 65 <0.000031>
> [pid 13471] 0.000120 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 200 <4.250732>
> [pid 13471] 4.250863 gettimeofday({1305046299, 948362}, NULL) = 0
> <0.000021>
> [pid 13471] 0.000129 clock_gettime(CLOCK_REALTIME, {1305046299,
> 948491000}) = 0 <0.000021>
> [pid 13471] 0.000106 write(1, "6299.948491000:42497940: main Q:"...,
> 81) = 81 <0.000051>
> [pid 13471] 0.000140 clock_gettime(CLOCK_REALTIME, {1305046299,
> 948735000}) = 0 <0.000021>
> [pid 13471] 0.000101 write(1, "6299.948735000:42497940: main Q:"...,
> 65) = 65 <0.000031>
> [pid 13471] 0.000123 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 200 <2.178833>
> [pid 13471] 2.178931 gettimeofday({1305046302, 127891}, NULL) = 0
> <0.000022>
> [pid 13471] 0.000112 clock_gettime(CLOCK_REALTIME, {1305046302,
> 128003000}) = 0 <0.000021>
> [pid 13471] 0.000101 write(1, "6302.128003000:42497940: main Q:"...,
> 81) = 81 <0.000036>
> [pid 13471] 0.000121 clock_gettime(CLOCK_REALTIME, {1305046302,
> 128225000}) = 0 <0.000021>
> [pid 13471] 0.000102 write(1, "6302.128225000:42497940: main Q:"...,
> 65) = 65 <0.000031>
> [pid 13471] 0.000115 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 200 <2.431092>
> [pid 13471] 2.431228 gettimeofday({1305046304, 559672}, NULL) = 0
> <0.000022>
> [pid 13471] 0.000132 clock_gettime(CLOCK_REALTIME, {1305046304,
> 559803000}) = 0 <0.000021>
> [pid 13471] 0.000110 write(1, "6304.559803000:42497940: main Q:"...,
> 81) = 81 <0.000052>
> [pid 13471] 0.000140 clock_gettime(CLOCK_REALTIME, {1305046304,
> 560052000}) = 0 <0.000021>
> [pid 13471] 0.000102 write(1, "6304.560052000:42497940: main Q:"...,
> 65) = 65 <0.000031>
> [pid 13471] 0.000119 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 197 <0.678610>
> [pid 13471] 0.678704 gettimeofday({1305046305, 238976}, NULL) = 0
> <0.000021>
> [pid 13471] 0.000110 clock_gettime(CLOCK_REALTIME, {1305046305,
> 239087000}) = 0 <0.000024>
> [pid 13471] 0.000105 write(1, "6305.239087000:42497940: main Q:"...,
> 81) = 81 <0.000035>
> [pid 13471] 0.000120 clock_gettime(CLOCK_REALTIME, {1305046305,
> 239312000}) = 0 <0.000020>
> [pid 13471] 0.000102 write(1, "6305.239312000:42497940: main Q:"...,
> 65) = 65 <0.000030>
> [pid 13471] 0.000114 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 200 <3.058797>
> [pid 13471] 3.058929 gettimeofday({1305046308, 298458}, NULL) = 0
> <0.000022>
> [pid 13471] 0.000131 clock_gettime(CLOCK_REALTIME, {1305046308,
> 298589000}) = 0 <0.000021>
> [pid 13471] 0.000106 write(1, "6308.298589000:42497940: main Q:"...,
> 81) = 81 <0.000058>
> [pid 13471] 0.000148 clock_gettime(CLOCK_REALTIME, {1305046308,
> 298841000}) = 0 <0.000021>
> [pid 13471] 0.000101 write(1, "6308.298841000:42497940: main Q:"...,
> 65) = 65 <0.000093>
> [pid 13471] 0.000181 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 197 <0.213955>
> [pid 13471] 0.214082 gettimeofday({1305046308, 513206}, NULL) = 0
> <0.000021>
> [pid 13471] 0.000128 clock_gettime(CLOCK_REALTIME, {1305046308,
> 513335000}) = 0 <0.000021>
> [pid 13471] 0.000106 write(1, "6308.513335000:42497940: main Q:"...,
> 81) = 81 <0.000042>
> [pid 13471] 0.000130 clock_gettime(CLOCK_REALTIME, {1305046308,
> 513570000}) = 0 <0.000021>
> [pid 13471] 0.000102 write(1, "6308.513570000:42497940: main Q:"...,
> 65) = 65 <0.000030>
> [pid 13471] 0.000122 read(3, "<6>iptables: log and drop IN=eth"...,
> 4095) = 196 <0.835091>
> [pid 13471] 0.835187 gettimeofday({1305046309, 348980}, NULL) = 0
> <0.000021>
> [pid 13471] 0.000110 clock_gettime(CLOCK_REALTIME, {1305046309,
> 349091000}) = 0 <0.000021>
> [pid 13471] 0.000102 write(1, "6309.349091000:42497940: main Q:"...,
> 81) = 81 <0.000035>
> [pid 13471] 0.000121 clock_gettime(CLOCK_REALTIME, {1305046309,
> 349313000}) = 0 <0.000020>
> [pid 13471] 0.000102 write(1, "6309.349313000:42497940: main Q:"...,
> 65) = 65 <0.000030>
> [pid 13471] 0.000114 read(3, <unfinished ...>
> [pid 13473] 2.018865 <... epoll_wait resumed> {{EPOLLIN, {u32=2,
> u64=2}}}, 10, 4294967295) = 1 <17.093033>
> [pid 13473] 0.000086 clock_gettime(CLOCK_REALTIME, {1305046311,
> 368481000}) = 0 <0.000020>
> [pid 13473] 0.000098 write(1, "6311.368481000:43899940: imudp: "...,
> 65) = 65 <0.000036>
>
>
> 6. Attached is the log file while rsyslogd is running in debug mode
> In the log,after "omoracle insertion to DB succeeded", there's
no
> more oracle log information related.
>
>
> Thank you very much.
>
> Best,
> Henry
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Tuesday, May 10, 2011 1:19 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to
> Oracle
>
> rsyslog uses multiple threads, if you can tell which thread is using
> all the CPU and then do a strace of it and dump the sequence of calls
> that it makes to process one of the messages it may help
>
> I don't have any idea what's going on here, so I'm just working
> through the steps that I would be doing if I was running into the
> problem
>
> try ps -eLf to list the different threads
>
> in top, hit 'H' to run on per-thread reporting (at which point you may
> get some additional info on the right about what the thread is doing)
>
> from the stats you are showing here, it looks like the thread you got
> with your strace is the parent that's receiving messages (lots of
> gettimeofday and receive/read type calls) the correct one should have
> write calls in it (among others)
>
> the versions you are running are new enough that we shouldn't have any
> old, solved problems hitting us.
>
> On Tue, 10 May 2011, bxu
> wrote:
>
>> Hi David,
>> The rsyslogd version I tried are 5.8.0(stable) and 6.1.7(devel).
>> I straced rsyslogd when CPU usage is keeping at 100%.
>>
>> % time seconds usecs/call calls errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>> 43.47 0.066967 2309 29 read
>> 38.28 0.058979 855 69 epoll_wait
>> 18.17 0.027994 9331 3 select
>> 0.05 0.000081 1 114 gettimeofday
>> 0.03 0.000041 0 178 69 recvfrom
>> 0.00 0.000000 0 3 recvmsg
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00 0.154062 396 69 total
>>
>>
>> Best,
>> Henry
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of [email protected]
>> Sent: Tuesday, May 10, 2011 12:35 AM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to
>> Oracle
>>
>> I accidently deleted your reply, so I am replying to my message
>>
>> 4-5 messages/secshould not be any problem at all.
>>
>> since writing a copy of the sql commands to a file doesn't show
>> anything strange, the next step will be to try and run it in debug
>> mode to see if it logs anything strange as you start having problems.
>>
>> the next step after that will probably be to do a strace of the
>> thread that is using all the CPU
>>
>> I don't see where you mention what version of rsyslog you are using.
>>
>> David Lang
>>
>> On Mon, 9 May 2011, [email protected] wrote:
>>
>>> Date: Mon, 9 May 2011 10:56:30 -0700 (PDT)
>>> From: [email protected]
>>> Reply-To: rsyslog-users <[email protected]>
>>> To: rsyslog-users <[email protected]>
>>> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to
>>> Oracle
>>>
>>> given that it only happens after a few messages, I would suspect
>>> that there is something odd with the messages.
>>>
>>> could you try writing the messages to a file using the Oracle
>>> template and then look and see if there is anything odd about the
>>> messages when it goes nuts?
>>>
>>> the other option is to get a debug dump, but unless it happens right
>>> away, that is likely to be a large dump.
>>>
>>> David Lang
>>>
>>> On Mon, 9 May 2011, bxu wrote:
>>>
>>>> Date: Mon, 9 May 2011 10:48:50 -0500
>>>> From: bxu <[email protected]>
>>>> Reply-To: rsyslog-users <[email protected]>
>>>> To: [email protected]
>>>> Subject: [rsyslog] help : CPU high load of rsyslog writing to
>>>> Oracle
>>>>
>>>> Hi All,
>>>>
>>>>
>>>> I have configured rsyslog writing to Oracle.The writing was
>>>> successful.But after several messages have been made into
>>>> Oracle,the CPU use percentage of rsyslogd goes up to 99%-100%.It is
>>>> abnormally too
>> much high.
>>>> There's no problem with the cpu usage when writing to files.
>>>>
>>>> Here is my rsyslogd.conf:
>>>> #====================================================
>>>> #*.* >192.168.45.213,rsyslog
>>>>
>>>> # Use traditional timestamp format
>>>> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
>>>>
>>>> # Provides kernel logging support (previously done by rklogd)
>>>> $ModLoad imklog # Provides support for local system logging (e.g.
>>>> via logger command) $ModLoad imuxsock
>>>>
>>>> # Log all kernel messages to the console.
>>>> # Logging much else clutters up the screen.
>>>> #kern.* /dev/console
>>>>
>>>> # Log anything (except mail) of level info or higher.
>>>> # Don't log private authentication messages!
>>>> *.info;mail.none;authpriv.none;cron.none /var/log/messages
>>>>
>>>> # The authpriv file has restricted access.
>>>> authpriv.* /var/log/secure
>>>>
>>>> # Log all the mail messages in one place.
>>>> mail.* -/var/log/maillog
>>>>
>>>> # Log cron stuff
>>>> cron.* /var/log/cron
>>>>
>>>> # Everybody gets emergency messages *.emerg *
>>>>
>>>> # Save news errors of level crit and higher in a special file.
>>>> uucp,news.crit /var/log/spooler
>>>>
>>>> # Save boot messages also to boot.log
>>>> local7.* /var/log/boot.log
>>>>
>>>> #Load External Listender
>>>> $modload imudp
>>>>
>>>>
>>>> $UDPServerRun 514
>>>> #SPA Messages
>>>> #*.* /var/log/linksys.log
>>>> $template DynaFile,"/var/log/system-%HOSTNAME%.log"
>>>> *.* -?DynaFile
>>>>
>>>>
>>>> ################
>>>> #### ORACLE ####
>>>> ################
>>>>
>>>> $ModLoad omoracle
>>>> $OmoracleDBUser syslog
>>>> $OmoracleDBPassword syslog
>>>> $OmoracleDB syslog
>>>> $OmoracleBatchSize 1
>>>> $OmoracleBatchItemSize 4096
>>>> $OmoracleStatementTemplate OmoracleStatement
>>>>
>>>> $template OmoracleStatement,"INSERT INTO TEST(hostname,message)
>>>> VALUES(:hostname,:msg)"
>>>> $template TestStmt,"%hostname%%msg%"
>>>>
>>>> *.* :omoracle:;TestStmt
>>>> #====================================================
>>>>
>>>> Really appreciate anyone's help!
>>>>
>>>> Best
>>>>
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
>> --
>> This message has been scanned for viruses and dangerous content by
>> MailScanner, and is believed to be clean.
>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
> --
> This message has been scanned for viruses and dangerous content by
> MailScanner, and is believed to be clean.
>
--
This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com