It would possibly be useful if you gave rsyslog another 5 seconds between the first ctl-c and the successive ones. I then could see how the thread term is handled, which gives some further clues. However, this is by far not as informative as the other things I suggested, so please combine both of it.
Thanks, Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of Henry Xu > Sent: Thursday, May 12, 2011 7:03 AM > To: 'rsyslog-users' > Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > Oracle > > Attached is another trace of rsyslogd with thread ID 2665 which hosted > all > the CPU. > That's all the log info. > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of [email protected] > Sent: Wednesday, May 11, 2011 11:23 PM > To: rsyslog-users > Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > Oracle > > this should point us at the problem (although I think Rainer will need > to > look at this). > > one minor correction, if the thread was blocked it would be eating zero > CPU, > not 100%, so it probably went into some loop that is completly in > userspace > after getting that response. > > it may (or may not) help to get more of the rsyslog.strace..23298 file, > how > large is it? > > David Lang > > On Wed, 11 May 2011, bxu wrote: > > > Attached is the debug file when use " strace -ff -o rsyslog.strace. - > C > > -ttt -T -s 100 rsyslogd -c5 -nd >rsyslog.debug". > > Thread ID 23298 consumes almost all the CPU time,which is used to > > write to Oracle. > > In the file 'rsyslog.strace..23298',there's no more log after " > > 1305142850.448202 write(1, "action call returned 0\n", 23) = 23 > > <0.000027>",seems this thread was blocked. > > I think there is a loop after some wrong. > > > > -----Original Message----- > > From: bxu [mailto:[email protected]] > > Sent: Wednesday, May 11, 2011 1:15 AM > > To: 'rsyslog-users' > > Subject: RE: [rsyslog] help : CPU high load of rsyslog writing to > > Oracle > > > > Hi David, > > Yes. The 'H' was turned on for the TOP info. > > > > % time seconds usecs/call calls errors syscall > > ------ ----------- ----------- --------- --------- ---------------- > > 49.99 0.008990 5 1858 929 futex > > 33.34 0.005996 857 7 select > > 16.67 0.002997 20 151 read > > 0.00 0.000000 0 542 recvfrom > > 0.00 0.000000 0 7 recvmsg > > 0.00 0.000000 0 468 gettimeofday > > 0.00 0.000000 0 1860 clock_gettime > > ------ ----------- ----------- --------- --------- ---------------- > > 100.00 0.017983 4893 929 total > > > > As seen from the above TOP info,there are 929 errors for futex > > call.Sometimes there were errors for recvmsg call where I did more > > straces.Does this relate to the CPU high load? > > > > I'll do more tests as your suggestion. > > > > Thanks, > > Henry > > > > -----Original Message----- > > From: [email protected] > > [mailto:[email protected]] On Behalf Of [email protected] > > Sent: Tuesday, May 10, 2011 8:29 PM > > To: rsyslog-users > > Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > > Oracle > > > > thanks for the info. > > > > when you printed the TOP info, was that with 'H' (thread details) > turned > on? > > > > I'm puzzled by the fact that you are doing a strace of the thread > > 13470 and it's showing actions by many different pids (it looks like > > everything except > > 13470 > > > > can you try doing > > > > strace -ff -o rsyslog.strace. -C -ttt -T -s 100 rsyslogd -c5 -nd > >> rsyslog.debug > > > > this will create a bunch of strace files. we shouldn't need to look > at > > all of them, just the one using all the CPU (and possibly only the > end > > of that > > one) > > > > while this is running, do a ps -eLf > > > > also, run top -H and note which thread is using all the CPU (you may > > want to try toggling H while it's running to make sure you are > getting > > the thread info not the consolodated process info) > > > > I'm about to head home and when I get home I'll look at the debug > > output that you send. > > > > one thing I'm not remembering. are the logs getting into the > database? > > > > David Lang > > > > On Tue, 10 May 2011, bxu wrote: > > > >> Hi David, > >> Here are what I'm doing: > >> 1. oracle configuration of rsyslogd.conf > >> > >> ################ > >> #### ORACLE #### > >> ################ > >> > >> $ModLoad omoracle > >> $OmoracleDBUser syslog > >> $OmoracleDBPassword syslog > >> $OmoracleDB syslog > >> $OmoracleBatchSize 1 > >> $OmoracleBatchItemSize 4096 > >> $OmoracleStatementTemplate OmoracleStatement > >> > >> $template OmoracleStatement,"INSERT INTO TEST(Hostname,Message) > >> VALUES(:hostname,:msg)" > >> $template TestStmt,"%hostname%%msg%" > >> $template TestStmtFile,"INSERT INTO TEST(Hostname,Message) > >> VALUES('%hostname%','%msg%')\n" > >> > >> #$WorkDirectory /rsyslog/work # default location for work (spool) > >> files > >> > >> #$ActionQueueType LinkedList # use asynchronous processing > >> #$ActionQueueFileName dbq # set file name, also enables disk mode > >> #$ActionResumeRetryCount -1 # infinite retries on insert failure > >> > >> :msg,contains,"SIP" :omoracle:;TestStmt > >> > >> & ~ > >> > >> *.* /var/log/syslog-test;TestStmtFile > >> > >> 2. running command : rsyslogd -c5 -nd > tt & > >> > >> 3, TOP information > >> > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> > >> 13470 root 25 0 244m 11m 7636 R 99.9 0.1 1:06.39 rs:main > Q:Reg > >> > >> 11410 root 15 0 12768 1144 832 R 0.3 0.0 0:13.52 top > >> > >> 1 root 15 0 10372 692 580 S 0.0 0.0 0:00.51 init > >> > >> 4. when CPU usage is keeping at 100%,the summary of strace by > thread > >> 13470: > >> > >> % time seconds usecs/call calls errors syscall > >> ------ ----------- ----------- --------- --------- ---------------- > >> 51.26 0.019996 9998 2 epoll_wait > >> 46.12 0.017990 1799 10 read > >> 2.56 0.000999 999 1 select > >> 0.06 0.000022 2 12 2 recvfrom > >> 0.00 0.000000 0 64 write > >> 0.00 0.000000 0 1 recvmsg > >> 0.00 0.000000 0 17 gettimeofday > >> 0.00 0.000000 0 62 clock_gettime > >> ------ ----------- ----------- --------- --------- ---------------- > >> 100.00 0.039007 169 2 total > >> > >> 5. when CPU usage is keeping at 100%,some of the call sequence of > >> strace by thread 13470: > >> > >> [pid 13469] 0.000000 select(1, NULL, NULL, NULL, {86097, > 801000} > >> <unfinished ...> > >> [pid 13473] 0.000089 clock_gettime(CLOCK_REALTIME, <unfinished > ...> > >> [pid 13472] 0.000054 select(9, [8], NULL, NULL, NULL > <unfinished > ...> > >> [pid 13471] 0.000066 read(3, <unfinished ...> > >> [pid 13473] 0.000048 <... clock_gettime resumed> {1305046294, > >> 274947000}) = 0 <0.000138> > >> [pid 13473] 0.000067 write(1, "6294.274947000:43899940: imudp: > "..., > >> 66) = 66 <0.000042> > >> [pid 13473] 0.000137 epoll_wait(12, <unfinished ...> > >> [pid 13471] 1.421497 <... read resumed> "<6>iptables: log and > drop > >> IN=eth"..., 4095) = 200 <1.421706> > >> [pid 13471] 0.000095 gettimeofday({1305046295, 696914}, NULL) = > 0 > >> <0.000022> > >> [pid 13471] 0.000130 clock_gettime(CLOCK_REALTIME, {1305046295, > >> 697046000}) = 0 <0.000022> > >> [pid 13471] 0.000104 write(1, "6295.697046000:42497940: main > Q:"..., > >> 81) = 81 <0.000037> > >> [pid 13471] 0.000127 clock_gettime(CLOCK_REALTIME, {1305046295, > >> 697276000}) = 0 <0.000020> > >> [pid 13471] 0.000102 write(1, "6295.697276000:42497940: main > Q:"..., > >> 65) = 65 <0.000031> > >> [pid 13471] 0.000120 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 200 <4.250732> > >> [pid 13471] 4.250863 gettimeofday({1305046299, 948362}, NULL) = > 0 > >> <0.000021> > >> [pid 13471] 0.000129 clock_gettime(CLOCK_REALTIME, {1305046299, > >> 948491000}) = 0 <0.000021> > >> [pid 13471] 0.000106 write(1, "6299.948491000:42497940: main > Q:"..., > >> 81) = 81 <0.000051> > >> [pid 13471] 0.000140 clock_gettime(CLOCK_REALTIME, {1305046299, > >> 948735000}) = 0 <0.000021> > >> [pid 13471] 0.000101 write(1, "6299.948735000:42497940: main > Q:"..., > >> 65) = 65 <0.000031> > >> [pid 13471] 0.000123 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 200 <2.178833> > >> [pid 13471] 2.178931 gettimeofday({1305046302, 127891}, NULL) = > 0 > >> <0.000022> > >> [pid 13471] 0.000112 clock_gettime(CLOCK_REALTIME, {1305046302, > >> 128003000}) = 0 <0.000021> > >> [pid 13471] 0.000101 write(1, "6302.128003000:42497940: main > Q:"..., > >> 81) = 81 <0.000036> > >> [pid 13471] 0.000121 clock_gettime(CLOCK_REALTIME, {1305046302, > >> 128225000}) = 0 <0.000021> > >> [pid 13471] 0.000102 write(1, "6302.128225000:42497940: main > Q:"..., > >> 65) = 65 <0.000031> > >> [pid 13471] 0.000115 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 200 <2.431092> > >> [pid 13471] 2.431228 gettimeofday({1305046304, 559672}, NULL) = > 0 > >> <0.000022> > >> [pid 13471] 0.000132 clock_gettime(CLOCK_REALTIME, {1305046304, > >> 559803000}) = 0 <0.000021> > >> [pid 13471] 0.000110 write(1, "6304.559803000:42497940: main > Q:"..., > >> 81) = 81 <0.000052> > >> [pid 13471] 0.000140 clock_gettime(CLOCK_REALTIME, {1305046304, > >> 560052000}) = 0 <0.000021> > >> [pid 13471] 0.000102 write(1, "6304.560052000:42497940: main > Q:"..., > >> 65) = 65 <0.000031> > >> [pid 13471] 0.000119 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 197 <0.678610> > >> [pid 13471] 0.678704 gettimeofday({1305046305, 238976}, NULL) = > 0 > >> <0.000021> > >> [pid 13471] 0.000110 clock_gettime(CLOCK_REALTIME, {1305046305, > >> 239087000}) = 0 <0.000024> > >> [pid 13471] 0.000105 write(1, "6305.239087000:42497940: main > Q:"..., > >> 81) = 81 <0.000035> > >> [pid 13471] 0.000120 clock_gettime(CLOCK_REALTIME, {1305046305, > >> 239312000}) = 0 <0.000020> > >> [pid 13471] 0.000102 write(1, "6305.239312000:42497940: main > Q:"..., > >> 65) = 65 <0.000030> > >> [pid 13471] 0.000114 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 200 <3.058797> > >> [pid 13471] 3.058929 gettimeofday({1305046308, 298458}, NULL) = > 0 > >> <0.000022> > >> [pid 13471] 0.000131 clock_gettime(CLOCK_REALTIME, {1305046308, > >> 298589000}) = 0 <0.000021> > >> [pid 13471] 0.000106 write(1, "6308.298589000:42497940: main > Q:"..., > >> 81) = 81 <0.000058> > >> [pid 13471] 0.000148 clock_gettime(CLOCK_REALTIME, {1305046308, > >> 298841000}) = 0 <0.000021> > >> [pid 13471] 0.000101 write(1, "6308.298841000:42497940: main > Q:"..., > >> 65) = 65 <0.000093> > >> [pid 13471] 0.000181 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 197 <0.213955> > >> [pid 13471] 0.214082 gettimeofday({1305046308, 513206}, NULL) = > 0 > >> <0.000021> > >> [pid 13471] 0.000128 clock_gettime(CLOCK_REALTIME, {1305046308, > >> 513335000}) = 0 <0.000021> > >> [pid 13471] 0.000106 write(1, "6308.513335000:42497940: main > Q:"..., > >> 81) = 81 <0.000042> > >> [pid 13471] 0.000130 clock_gettime(CLOCK_REALTIME, {1305046308, > >> 513570000}) = 0 <0.000021> > >> [pid 13471] 0.000102 write(1, "6308.513570000:42497940: main > Q:"..., > >> 65) = 65 <0.000030> > >> [pid 13471] 0.000122 read(3, "<6>iptables: log and drop > IN=eth"..., > >> 4095) = 196 <0.835091> > >> [pid 13471] 0.835187 gettimeofday({1305046309, 348980}, NULL) = > 0 > >> <0.000021> > >> [pid 13471] 0.000110 clock_gettime(CLOCK_REALTIME, {1305046309, > >> 349091000}) = 0 <0.000021> > >> [pid 13471] 0.000102 write(1, "6309.349091000:42497940: main > Q:"..., > >> 81) = 81 <0.000035> > >> [pid 13471] 0.000121 clock_gettime(CLOCK_REALTIME, {1305046309, > >> 349313000}) = 0 <0.000020> > >> [pid 13471] 0.000102 write(1, "6309.349313000:42497940: main > Q:"..., > >> 65) = 65 <0.000030> > >> [pid 13471] 0.000114 read(3, <unfinished ...> > >> [pid 13473] 2.018865 <... epoll_wait resumed> {{EPOLLIN, > {u32=2, > >> u64=2}}}, 10, 4294967295) = 1 <17.093033> > >> [pid 13473] 0.000086 clock_gettime(CLOCK_REALTIME, {1305046311, > >> 368481000}) = 0 <0.000020> > >> [pid 13473] 0.000098 write(1, "6311.368481000:43899940: imudp: > "..., > >> 65) = 65 <0.000036> > >> > >> > >> 6. Attached is the log file while rsyslogd is running in debug > mode > >> In the log,after "omoracle insertion to DB succeeded", > there's > > no > >> more oracle log information related. > >> > >> > >> Thank you very much. > >> > >> Best, > >> Henry > >> > >> -----Original Message----- > >> From: [email protected] > >> [mailto:[email protected]] On Behalf Of > [email protected] > >> Sent: Tuesday, May 10, 2011 1:19 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > >> Oracle > >> > >> rsyslog uses multiple threads, if you can tell which thread is using > >> all the CPU and then do a strace of it and dump the sequence of > calls > >> that it makes to process one of the messages it may help > >> > >> I don't have any idea what's going on here, so I'm just working > >> through the steps that I would be doing if I was running into the > >> problem > >> > >> try ps -eLf to list the different threads > >> > >> in top, hit 'H' to run on per-thread reporting (at which point you > >> may get some additional info on the right about what the thread is > >> doing) > >> > >> from the stats you are showing here, it looks like the thread you > got > >> with your strace is the parent that's receiving messages (lots of > >> gettimeofday and receive/read type calls) the correct one should > have > >> write calls in it (among others) > >> > >> the versions you are running are new enough that we shouldn't have > >> any old, solved problems hitting us. > >> > >> On Tue, 10 May 2011, bxu > >> wrote: > >> > >>> Hi David, > >>> The rsyslogd version I tried are 5.8.0(stable) and 6.1.7(devel). > >>> I straced rsyslogd when CPU usage is keeping at 100%. > >>> > >>> % time seconds usecs/call calls errors syscall > >>> ------ ----------- ----------- --------- --------- ---------------- > >>> 43.47 0.066967 2309 29 read > >>> 38.28 0.058979 855 69 epoll_wait > >>> 18.17 0.027994 9331 3 select > >>> 0.05 0.000081 1 114 gettimeofday > >>> 0.03 0.000041 0 178 69 recvfrom > >>> 0.00 0.000000 0 3 recvmsg > >>> ------ ----------- ----------- --------- --------- ---------------- > >>> 100.00 0.154062 396 69 total > >>> > >>> > >>> Best, > >>> Henry > >>> > >>> -----Original Message----- > >>> From: [email protected] > >>> [mailto:[email protected]] On Behalf Of > >>> [email protected] > >>> Sent: Tuesday, May 10, 2011 12:35 AM > >>> To: rsyslog-users > >>> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > >>> Oracle > >>> > >>> I accidently deleted your reply, so I am replying to my message > >>> > >>> 4-5 messages/secshould not be any problem at all. > >>> > >>> since writing a copy of the sql commands to a file doesn't show > >>> anything strange, the next step will be to try and run it in debug > >>> mode to see if it logs anything strange as you start having > problems. > >>> > >>> the next step after that will probably be to do a strace of the > >>> thread that is using all the CPU > >>> > >>> I don't see where you mention what version of rsyslog you are > using. > >>> > >>> David Lang > >>> > >>> On Mon, 9 May 2011, [email protected] wrote: > >>> > >>>> Date: Mon, 9 May 2011 10:56:30 -0700 (PDT) > >>>> From: [email protected] > >>>> Reply-To: rsyslog-users <[email protected]> > >>>> To: rsyslog-users <[email protected]> > >>>> Subject: Re: [rsyslog] help : CPU high load of rsyslog writing to > >>>> Oracle > >>>> > >>>> given that it only happens after a few messages, I would suspect > >>>> that there is something odd with the messages. > >>>> > >>>> could you try writing the messages to a file using the Oracle > >>>> template and then look and see if there is anything odd about the > >>>> messages when it goes nuts? > >>>> > >>>> the other option is to get a debug dump, but unless it happens > >>>> right away, that is likely to be a large dump. > >>>> > >>>> David Lang > >>>> > >>>> On Mon, 9 May 2011, bxu wrote: > >>>> > >>>>> Date: Mon, 9 May 2011 10:48:50 -0500 > >>>>> From: bxu <[email protected]> > >>>>> Reply-To: rsyslog-users <[email protected]> > >>>>> To: [email protected] > >>>>> Subject: [rsyslog] help : CPU high load of rsyslog writing to > >>>>> Oracle > >>>>> > >>>>> Hi All, > >>>>> > >>>>> > >>>>> I have configured rsyslog writing to Oracle.The writing was > >>>>> successful.But after several messages have been made into > >>>>> Oracle,the CPU use percentage of rsyslogd goes up to 99%-100%.It > >>>>> is abnormally too > >>> much high. > >>>>> There's no problem with the cpu usage when writing to files. > >>>>> > >>>>> Here is my rsyslogd.conf: > >>>>> #==================================================== > >>>>> #*.* >192.168.45.213,rsyslog > >>>>> > >>>>> # Use traditional timestamp format $ActionFileDefaultTemplate > >>>>> RSYSLOG_TraditionalFileFormat > >>>>> > >>>>> # Provides kernel logging support (previously done by rklogd) > >>>>> $ModLoad imklog # Provides support for local system logging (e.g. > >>>>> via logger command) $ModLoad imuxsock > >>>>> > >>>>> # Log all kernel messages to the console. > >>>>> # Logging much else clutters up the screen. > >>>>> #kern.* /dev/console > >>>>> > >>>>> # Log anything (except mail) of level info or higher. > >>>>> # Don't log private authentication messages! > >>>>> *.info;mail.none;authpriv.none;cron.none /var/log/messages > >>>>> > >>>>> # The authpriv file has restricted access. > >>>>> authpriv.* /var/log/secure > >>>>> > >>>>> # Log all the mail messages in one place. > >>>>> mail.* -/var/log/maillog > >>>>> > >>>>> # Log cron stuff > >>>>> cron.* /var/log/cron > >>>>> > >>>>> # Everybody gets emergency messages *.emerg * > >>>>> > >>>>> # Save news errors of level crit and higher in a special file. > >>>>> uucp,news.crit /var/log/spooler > >>>>> > >>>>> # Save boot messages also to boot.log > >>>>> local7.* /var/log/boot.log > >>>>> > >>>>> #Load External Listender > >>>>> $modload imudp > >>>>> > >>>>> > >>>>> $UDPServerRun 514 > >>>>> #SPA Messages > >>>>> #*.* /var/log/linksys.log > >>>>> $template DynaFile,"/var/log/system-%HOSTNAME%.log" > >>>>> *.* -?DynaFile > >>>>> > >>>>> > >>>>> ################ > >>>>> #### ORACLE #### > >>>>> ################ > >>>>> > >>>>> $ModLoad omoracle > >>>>> $OmoracleDBUser syslog > >>>>> $OmoracleDBPassword syslog > >>>>> $OmoracleDB syslog > >>>>> $OmoracleBatchSize 1 > >>>>> $OmoracleBatchItemSize 4096 > >>>>> $OmoracleStatementTemplate OmoracleStatement > >>>>> > >>>>> $template OmoracleStatement,"INSERT INTO TEST(hostname,message) > >>>>> VALUES(:hostname,:msg)" > >>>>> $template TestStmt,"%hostname%%msg%" > >>>>> > >>>>> *.* :omoracle:;TestStmt > >>>>> #==================================================== > >>>>> > >>>>> Really appreciate anyone's help! > >>>>> > >>>>> Best > >>>>> > >>>>> _______________________________________________ > >>>>> rsyslog mailing list > >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>> http://www.rsyslog.com > >>>>> > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>>> > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >>> -- > >>> This message has been scanned for viruses and dangerous content by > >>> MailScanner, and is believed to be clean. > >>> > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > >> > >> -- > >> This message has been scanned for viruses and dangerous content by > >> MailScanner, and is believed to be clean. > >> > > -- > > This message has been scanned for viruses and dangerous content by > > MailScanner, and is believed to be clean. > > > > > -- > This message has been scanned for viruses and dangerous content by > MailScanner, and is believed to be clean. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

