I can see where it is deleting from the database action queue in chunks of 10,000 now:
6702.235489189:SecInfra2ArchiveDB queue:Reg/w0: DeleteProcessedBatch: we deleted 10000 objects and enqueued 0 objects I assume that means they are being inserted in chunks, too. But watching the processlist in mysql, it appears that they are still being inserted one-at-a-time, albeit in a transaction, so maybe that's what is meant by "batch" processing? I see lots of entries like the following, but these appear to originate from the main queue, not the db action queue. Is that a correct interpretation? ... 6702.597844624:main Q:Reg/w0 : SecInfra2ArchiveDB queue: EnqueueMsg advised worker start 6702.597854174:main Q:Reg/w0 : action 'SecInfra2ArchiveDB': set suspended state to 0 6702.597861111:main Q:Reg/w0 : processBATCH: next msg 649: <14>1 2017-11-07T15:31:42.255-05:00 JF-ITE-PM2-00C RT_FLOW - RT_FLOW_SESSION_CREATE [junos@2636.1.1.1.2.28 source-address="10.24 6702.597867433:main Q:Reg/w0 : 6702.597880487:SecInfra2ArchiveDB queue:Reg/w0: dnscache: entry 0x7febc4017ed0 found 6702.597894312:main Q:Reg/w0 : ACTION 0 [ommysql:action(type="ommysql" ...)] 6702.597907683:main Q:Reg/w0 : executing action 0 6702.597919469:main Q:Reg/w0 : action 'SecInfra2ArchiveDB': called, logging to ommysql (susp 0/0, direct q 0) ... Thanks, Steven Libby On Wed, Nov 8, 2017 at 4:32 PM, David Lang <da...@lang.hm> wrote: > are the inserts into MySQL happening in large chunks now? > > if the queue is filling, it should be dequeueing the messages in large > batches and submitting them to MySQL in batches, if the batches to MySQL > are too small, you could have trouble keeping up. > > look in the debug logs for how many message are processed between calls to > insert them into MySQL. > > David Lang > > On Wed, 8 Nov 2017, Steven Libby wrote: > > Date: Wed, 8 Nov 2017 16:26:08 -0500 >> From: Steven Libby <libby.ste...@gmail.com> >> To: David Lang <da...@lang.hm> >> Cc: Steven Libby via rsyslog <rsyslog@lists.adiscon.com> >> Subject: Re: [rsyslog] ommysql database queue bottleneck >> >> >> I've converted the legacy $MainQueue% options over to the new RainerScript >> style and confirmed the queues (main and action) are dequeueing in the >> proper designated chunk sizes now. >> >> I've seen nothing else in the rsyslog debug log which would indicate a >> problem, but then again I might be missing something. Any help or >> guidance >> on how to properly debug this bottleneck would be greatly appreciated. >> >> Thanks, >> Steven Libby >> >> On Tue, Nov 7, 2017 at 2:06 PM, Steven Libby <libby.ste...@gmail.com> >> wrote: >> >> We had configured large batches (8192) for the main queue, action queue, >>> and udp input module. However, after a brief debugging session, it >>> appears >>> that at least two of these settings are not being honored... >>> >>> The database action queue seems to be deleting maximum batch sizes of 8 >>> at >>> a time: >>> >>> 7968.410369112:SecInfra2ArchiveDB queue[DA]:Reg/w0: regular consumer >>> finished, iret=0, szlog 235270 sz phys 235278 >>> 7968.410545345:SecInfra2ArchiveDB queue[DA]:Reg/w0: >>> DeleteProcessedBatch: >>> we deleted 8 objects and enqueued 0 objects >>> 7968.410554591:SecInfra2ArchiveDB queue[DA]:Reg/w0: doDeleteBatch: >>> delete >>> batch from store, new sizes: log 235270, phys 235270 >>> >>> And the main queue appears to be dequeueing in maximum chunks of 256: >>> >>> 7963.597767589:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 25 of 25 messages] >>> 7963.616701318:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 38 of 38 messages] >>> 7963.669420594:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 224 of 224 messages] >>> 7963.731337893:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 256 of 256 messages] >>> 7963.800962451:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 256 of 256 messages] >>> 7963.857743064:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 256 of 256 messages] >>> 7963.904639838:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 256 of 256 messages] >>> 7963.985528145:main Q:Reg/w0 : END batch execution phase, entering to >>> commit phase [processed 256 of 256 messages] >>> ... >>> >>> I'm starting to think some of the options in our config file are >>> incorrect >>> or no longer supported...? >>> >>> Also, is there any specific keywords I should be looking for in the debug >>> log? Or any specific debug options I should be turning on (i.e. >>> LogFuncFlow, PrintMutexAction, OutputTidToStderr, etc...)? >>> >>> >>> On Mon, Nov 6, 2017 at 4:54 PM, David Lang <da...@lang.hm> wrote: >>> >>> On Mon, 6 Nov 2017, Steven Libby wrote: >>>> >>>> I've tried removing the ruleset altogether and just using an action to >>>> >>>>> minimize locking overhead (which I assume would be the same thing >>>>> you're >>>>> referring to here). But just to make sure, I also tried changing the >>>>> ruleset queue type to "Direct" and got the same result. >>>>> >>>>> >>>> yes >>>> >>>> Also, most of the time during my testing I used resumeRetryCount="-1" >>>> but >>>> >>>>> mistakenly left it at "0" when I posted to this mailing list. >>>>> Reverting >>>>> back to "-1" still results in a backlog in the database action queue. >>>>> When >>>>> you say "...some insert attempts...are not being accepted by mysql...", >>>>> are >>>>> you implying that some insert attempts are actually being rejected by >>>>> mysql? >>>>> >>>>> >>>> that was one possibility when I thought you were not retrying logs. >>>> >>>> I have enabled the general query log with the highest level of >>>> >>>>> verbosity, and also installed the audit plugin, but I am unable to see >>>>> any >>>>> indication that any insert attempts are being rejected. I have yet to >>>>> enable the DBUG package yet for further investigation, but will try >>>>> that >>>>> next to see if I can glean any new information. >>>>> >>>>> >>>> are logs getting inserted one at a time or in large batches? try to >>>> increase batching, get a debug log for a bit on the rsyslog side and >>>> see if >>>> it's having to make multiple attempts to insert messages (for example, >>>> you >>>> try to deliver up to 8K messages at once, rsyslog could be rejecting >>>> this >>>> because the total request size is too large and rsyslog then tries to >>>> deliver 4K, 2K, 1K... messages until it gets small enough to be >>>> accepted) >>>> >>>> David Lang >>>> >>>> >>>> On a side note, I created a separate test table using the MyISAM storage >>>> >>>>> engine and the insert rate was improved, but still filled up the >>>>> database >>>>> action queue within several minutes. I've reduced the durability of >>>>> TokuDB >>>>> by disabling tokudb_commit_sync and tokudb_fsync_log_period global >>>>> variables, but there was zero noticeable performance difference here. >>>>> (Background: we use TokuDB for it's compression and secondary >>>>> clustering >>>>> indexes, mainly). I would expect to see excessive CPU utilization or >>>>> Disk >>>>> I/O if the database was struggling to keep up with the rsyslog queue, >>>>> but >>>>> perhaps that expectation is incorrect? >>>>> >>>>> Regards, >>>>> Steven >>>>> >>>>> On Sat, Nov 4, 2017 at 5:03 PM, David Lang <da...@lang.hm> wrote: >>>>> >>>>> don't put a queue on the ruleset and the only item in the ruleset >>>>> >>>>>> >>>>>> when you have to queues like this, you end up with lots of locking >>>>>> overhead and poor batch sizing. >>>>>> >>>>>> with retrycount=0 you never attempt to re-deliver a message that >>>>>> fails, >>>>>> if >>>>>> you set it to -1 it will retry forever >>>>>> >>>>>> try setting it to -1 and see if logs are getting stuck (delivery >>>>>> stops), >>>>>> if so, you are getting some insert attempts that are not being >>>>>> accepted >>>>>> by >>>>>> mysql and since you have retries disabled, the entire batch is being >>>>>> thrown >>>>>> away. >>>>>> >>>>>> >>>>>> On Fri, 3 Nov 2017, Steven Libby via rsyslog wrote: >>>>>> >>>>>> Date: Fri, 3 Nov 2017 14:53:17 -0400 >>>>>> >>>>>> From: Steven Libby via rsyslog <rsyslog@lists.adiscon.com> >>>>>>> To: rsyslog@lists.adiscon.com >>>>>>> Cc: Steven Libby <libby.ste...@gmail.com> >>>>>>> Subject: [rsyslog] ommysql database queue bottleneck >>>>>>> >>>>>>> >>>>>>> We are running rsyslog 8.21.0 on a RHEL 6.7 box with 256GB of RAM and >>>>>>> 96 >>>>>>> cores, writing to a MySQL instance (Percona Server 5.6.28-76) using >>>>>>> TokuDB >>>>>>> as the storage engine. This is our centralized log server, which >>>>>>> receives >>>>>>> messages from numerous network devices at an average rate of 7k >>>>>>> messages >>>>>>> per second. This message rate is fairly consistent throughout the >>>>>>> course >>>>>>> of the day. >>>>>>> >>>>>>> We noticed records missing from the database when comparing with the >>>>>>> raw >>>>>>> records being written directly to a file on another server. Based on >>>>>>> the >>>>>>> investigation so far, it appears the lost messages are being >>>>>>> discarded >>>>>>> due >>>>>>> to queues filling up. >>>>>>> >>>>>>> After enabling the impstats module, we discovered that the bottleneck >>>>>>> appears to be in the database queue; but we're having trouble >>>>>>> determining >>>>>>> the exact cause of the bottleneck. Database resource utilization is >>>>>>> relatively low (CPU utilization < 4%, Disk I/O < 1%, 5+GB free RAM) >>>>>>> and >>>>>>> we've tuned the database configuration to remove any unnecessary >>>>>>> overhead. >>>>>>> We've made numerous configuration changes, both on the DB side and in >>>>>>> rsyslogd, and the queue seems to build at roughly the same rate >>>>>>> regardless >>>>>>> of of everything we've tried. >>>>>>> >>>>>>> We are successfully writing 3k-4k records per-second into the DB, but >>>>>>> the >>>>>>> action queue is also building at about that same rate. >>>>>>> >>>>>>> Here is the latest rsyslog configuration we've tried (below) cleaned >>>>>>> up >>>>>>> for >>>>>>> the purpose of this message. Most of the incoming messages arrive >>>>>>> via >>>>>>> imudp. We've tried adjusting the number of threads in that module, >>>>>>> as >>>>>>> well >>>>>>> as the batchSize, and the number of messages coming in stays about >>>>>>> the >>>>>>> same, regardless. We've also tried different combinations of MainMsg >>>>>>> queue >>>>>>> worker threads and batch sizes, as well as similar adjustments in the >>>>>>> ruleset and action configurations. We've gone through the changelogs >>>>>>> up >>>>>>> to >>>>>>> rsyslog 8.31.0 and didn't see anything to indicate that upgrading >>>>>>> would >>>>>>> help with the performance. We've also gone through most of the >>>>>>> online >>>>>>> documentation at rsyslog.com/doc/v8-stable/ and read some articles >>>>>>> by >>>>>>> David >>>>>>> Long regarding performance improvements and the work he did with >>>>>>> Intuit. >>>>>>> >>>>>>> At this point, any additional help or guidance from the professionals >>>>>>> would >>>>>>> be greatly appreciated. Thanks for taking the time to read this... >>>>>>> >>>>>>> >>>>>>> #rsyslog v8.x config file >>>>>>> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat >>>>>>> $RepeatedMsgContainsOriginalMsg on >>>>>>> $OptimizeForUniprocessor off >>>>>>> $PreserveFQDN on >>>>>>> $WorkDirectory /HSD1/syslogQ >>>>>>> >>>>>>> $MainMsgQueueFileName mainq-SecInfra >>>>>>> $MainMsgQueueMaxDiskSpace 200g >>>>>>> $MainMsgQueueMaxFileSize 1g >>>>>>> $MainMsgQueueSaveOnShutdown on >>>>>>> $MainMsgQueueType LinkedList >>>>>>> $MainMsgQueueSize 10485760 >>>>>>> $MainMsgQueueHighWatermark 10000000 >>>>>>> $MainMsgQueueWorkerThreads 8 >>>>>>> $MainMsgDequeueBatchSize 8192 >>>>>>> >>>>>>> module(load="imudp" threads="8" batchSize="8192" timeRequery="10") >>>>>>> module(load="imptcp" threads="8" processOnPoller="off" ) >>>>>>> module(load="imuxsock") >>>>>>> module(load="imklog") >>>>>>> module(load="immark") >>>>>>> module(load="ommysql") >>>>>>> module(load="impstats" interval="5" resetCounters="off" >>>>>>> format="legacy" >>>>>>> log.syslog="off" log.file="/HSD1/locallog/pstat >>>>>>> s-SecInfra_2017-11-03.log") >>>>>>> module(load="mmutf8fix") >>>>>>> >>>>>>> $template SecInfraSyslogTmpl,"insert into SecInfraSyslog.syslogRaw >>>>>>> (recievedAt,deviceReportedTime,deviceTimezone,fromHost, >>>>>>> fromHostIpv4,hostName,facility,priority,localPriority,infoUnitId, >>>>>>> inputName,syslogTag,programName,appName,procId,msgId,message >>>>>>> ,rawMessage) >>>>>>> values ('%timegenerated:::date-mysql%.%timegenerated:::date- >>>>>>> subseconds%','%timereported:::date-mysql%.%timereported::: >>>>>>> date-subseconds%','%timereported:::date-tzoffsdirection%% >>>>>>> timereported: >>>>>>> ::date-tzoffshour%','%FROMHOST%','%fromhost-ip%','% >>>>>>> HOSTNAME%',%syslogfacility%,%syslogpriority%,'%PRI%',%iut%, >>>>>>> '%inputname%','%syslogtag%','%programname%','%APP-NAME%','% >>>>>>> PROCID%','%MSGID%','%msg%','%rawmsg%')",SQL >>>>>>> >>>>>>> ruleset(name="SecInfra" >>>>>>> queue.saveonshutdown="on" >>>>>>> queue.type="LinkedList" >>>>>>> queue.filename="SecInfra-ruleQ" >>>>>>> queue.highwatermark="1000000" >>>>>>> queue.size="1310720" >>>>>>> queue.dequeuebatchsize="8192" >>>>>>> queue.maxdiskspace="200g" >>>>>>> queue.maxfilesize="10g" >>>>>>> queue.workerthreads="2" >>>>>>> ) { >>>>>>> action(name="SecInfra2ArchiveDB" >>>>>>> type="ommysql" >>>>>>> server="127.0.0.1" >>>>>>> db="SecInfraSyslog" >>>>>>> uid="rsyslog" >>>>>>> pwd="************" >>>>>>> serverport="3306" >>>>>>> template="SecInfraSyslogTmpl" >>>>>>> queue.saveonshutdown="on" >>>>>>> queue.type="LinkedList" >>>>>>> queue.filename="SecInfraArchive-dbQ" >>>>>>> queue.highwatermark="1000000" >>>>>>> queue.size="1310720" >>>>>>> queue.dequeuebatchsize="8192" >>>>>>> queue.maxdiskspace="200g" >>>>>>> queue.maxfilesize="10g" >>>>>>> queue.workerthreads="2" >>>>>>> action.resumeRetryCount="0" >>>>>>> action.reportSuspension="on" >>>>>>> action.reportSuspensionContinuation="on" >>>>>>> ) >>>>>>> } >>>>>>> >>>>>>> input(type="imudp" name="SecInfraUDP" address="************" >>>>>>> port="514" >>>>>>> ruleset="SecInfra") >>>>>>> input(type="imtcp" name="SecInfraTCP" address="************" >>>>>>> port="514" >>>>>>> ruleset="SecInfra") >>>>>>> >>>>>>> Regards, >>>>>>> Steven Libby >>>>>>> _______________________________________________ >>>>>>> rsyslog mailing list >>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>>> http://www.rsyslog.com/professional-services/ >>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards >>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a >>>>>>> myriad >>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if >>>>>>> you >>>>>>> DON'T LIKE THAT. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>> >> _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.