I've converted the legacy $MainQueue% options over to the new RainerScript
style and confirmed the queues (main and action) are dequeueing in the
proper designated chunk sizes now.

I've seen nothing else in the rsyslog debug log which would indicate a
problem, but then again I might be missing something.  Any help or guidance
on how to properly debug this bottleneck would be greatly appreciated.

Thanks,
Steven Libby

On Tue, Nov 7, 2017 at 2:06 PM, Steven Libby <libby.ste...@gmail.com> wrote:

> We had configured large batches (8192) for the main queue, action queue,
> and udp input module.  However, after a brief debugging session, it appears
> that at least two of these settings are not being honored...
>
> The database action queue seems to be deleting maximum batch sizes of 8 at
> a time:
>
> 7968.410369112:SecInfra2ArchiveDB queue[DA]:Reg/w0: regular consumer
> finished, iret=0, szlog 235270 sz phys 235278
> 7968.410545345:SecInfra2ArchiveDB queue[DA]:Reg/w0: DeleteProcessedBatch:
> we deleted 8 objects and enqueued 0 objects
> 7968.410554591:SecInfra2ArchiveDB queue[DA]:Reg/w0: doDeleteBatch: delete
> batch from store, new sizes: log 235270, phys 235270
>
> And the main queue appears to be dequeueing in maximum chunks of 256:
>
> 7963.597767589:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 25 of 25 messages]
> 7963.616701318:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 38 of 38 messages]
> 7963.669420594:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 224 of 224 messages]
> 7963.731337893:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 256 of 256 messages]
> 7963.800962451:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 256 of 256 messages]
> 7963.857743064:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 256 of 256 messages]
> 7963.904639838:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 256 of 256 messages]
> 7963.985528145:main Q:Reg/w0  : END batch execution phase, entering to
> commit phase [processed 256 of 256 messages]
> ...
>
> I'm starting to think some of the options in our config file are incorrect
> or no longer supported...?
>
> Also, is there any specific keywords I should be looking for in the debug
> log?  Or any specific debug options I should be turning on (i.e.
> LogFuncFlow, PrintMutexAction, OutputTidToStderr, etc...)?
>
>
> On Mon, Nov 6, 2017 at 4:54 PM, David Lang <da...@lang.hm> wrote:
>
>> On Mon, 6 Nov 2017, Steven Libby wrote:
>>
>> I've tried removing the ruleset altogether and just using an action to
>>> minimize locking overhead (which I assume would be the same thing you're
>>> referring to here).  But just to make sure, I also tried changing the
>>> ruleset queue type to "Direct" and got the same result.
>>>
>>
>> yes
>>
>> Also, most of the time during my testing I used resumeRetryCount="-1" but
>>> mistakenly left it at "0" when I posted to this mailing list.  Reverting
>>> back to "-1" still results in a backlog in the database action queue.
>>> When
>>> you say "...some insert attempts...are not being accepted by mysql...",
>>> are
>>> you implying that some insert attempts are actually being rejected by
>>> mysql?
>>>
>>
>> that was one possibility when I thought you were not retrying logs.
>>
>>  I have enabled the general query log with the highest level of
>>> verbosity, and also installed the audit plugin, but I am unable to see
>>> any
>>> indication that any insert attempts are being rejected.  I have yet to
>>> enable the DBUG package yet for further investigation, but will try that
>>> next to see if I can glean any new information.
>>>
>>
>> are logs getting inserted one at a time or in large batches? try to
>> increase batching, get a debug log for a bit on the rsyslog side and see if
>> it's having to make multiple attempts to insert messages (for example, you
>> try to deliver up to 8K messages at once, rsyslog could be rejecting this
>> because the total request size is too large and rsyslog then tries to
>> deliver 4K, 2K, 1K... messages until it gets small enough to be accepted)
>>
>> David Lang
>>
>>
>> On a side note, I created a separate test table using the MyISAM storage
>>> engine and the insert rate was improved, but still filled up the database
>>> action queue within several minutes.  I've reduced the durability of
>>> TokuDB
>>> by disabling tokudb_commit_sync and tokudb_fsync_log_period global
>>> variables, but there was zero noticeable performance difference here.
>>> (Background: we use TokuDB for it's compression and secondary clustering
>>> indexes, mainly).  I would expect to see excessive CPU utilization or
>>> Disk
>>> I/O if the database was struggling to keep up with the rsyslog queue, but
>>> perhaps that expectation is incorrect?
>>>
>>> Regards,
>>> Steven
>>>
>>> On Sat, Nov 4, 2017 at 5:03 PM, David Lang <da...@lang.hm> wrote:
>>>
>>> don't put a queue on the ruleset and the only item in the ruleset
>>>>
>>>> when you have to queues like this, you end up with lots of locking
>>>> overhead and poor batch sizing.
>>>>
>>>> with retrycount=0 you never attempt to re-deliver a message that fails,
>>>> if
>>>> you set it to -1 it will retry forever
>>>>
>>>> try setting it to -1 and see if logs are getting stuck (delivery stops),
>>>> if so, you are getting some insert attempts that are not being accepted
>>>> by
>>>> mysql and since you have retries disabled, the entire batch is being
>>>> thrown
>>>> away.
>>>>
>>>>
>>>> On Fri, 3 Nov 2017, Steven Libby via rsyslog wrote:
>>>>
>>>> Date: Fri, 3 Nov 2017 14:53:17 -0400
>>>>
>>>>> From: Steven Libby via rsyslog <rsyslog@lists.adiscon.com>
>>>>> To: rsyslog@lists.adiscon.com
>>>>> Cc: Steven Libby <libby.ste...@gmail.com>
>>>>> Subject: [rsyslog] ommysql database queue bottleneck
>>>>>
>>>>>
>>>>> We are running rsyslog 8.21.0 on a RHEL 6.7 box with 256GB of RAM and
>>>>> 96
>>>>> cores, writing to a MySQL instance (Percona Server 5.6.28-76) using
>>>>> TokuDB
>>>>> as the storage engine.  This is our centralized log server, which
>>>>> receives
>>>>> messages from numerous network devices at an average rate of 7k
>>>>> messages
>>>>> per second.  This message rate is fairly consistent throughout the
>>>>> course
>>>>> of the day.
>>>>>
>>>>> We noticed records missing from the database when comparing with the
>>>>> raw
>>>>> records being written directly to a file on another server.  Based on
>>>>> the
>>>>> investigation so far, it appears the lost messages are being discarded
>>>>> due
>>>>> to queues filling up.
>>>>>
>>>>> After enabling the impstats module, we discovered that the bottleneck
>>>>> appears to be in the database queue; but we're having trouble
>>>>> determining
>>>>> the exact cause of the bottleneck.  Database resource utilization is
>>>>> relatively low (CPU utilization < 4%, Disk I/O < 1%, 5+GB free RAM) and
>>>>> we've tuned the database configuration to remove any unnecessary
>>>>> overhead.
>>>>> We've made numerous configuration changes, both on the DB side and in
>>>>> rsyslogd, and the queue seems to build at roughly the same rate
>>>>> regardless
>>>>> of of everything we've tried.
>>>>>
>>>>> We are successfully writing 3k-4k records per-second into the DB, but
>>>>> the
>>>>> action queue is also building at about that same rate.
>>>>>
>>>>> Here is the latest rsyslog configuration we've tried (below) cleaned up
>>>>> for
>>>>> the purpose of this message.  Most of the incoming messages arrive via
>>>>> imudp.  We've tried adjusting the number of threads in that module, as
>>>>> well
>>>>> as the batchSize, and the number of messages coming in stays about the
>>>>> same, regardless.  We've also tried different combinations of MainMsg
>>>>> queue
>>>>> worker threads and batch sizes, as well as similar adjustments in the
>>>>> ruleset and action configurations.  We've gone through the changelogs
>>>>> up
>>>>> to
>>>>> rsyslog 8.31.0 and didn't see anything to indicate that upgrading would
>>>>> help with the performance.  We've also gone through most of the online
>>>>> documentation at rsyslog.com/doc/v8-stable/ and read some articles by
>>>>> David
>>>>> Long regarding performance improvements and the work he did with
>>>>> Intuit.
>>>>>
>>>>> At this point, any additional help or guidance from the professionals
>>>>> would
>>>>> be greatly appreciated.  Thanks for taking the time to read this...
>>>>>
>>>>>
>>>>> #rsyslog v8.x config file
>>>>> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
>>>>> $RepeatedMsgContainsOriginalMsg on
>>>>> $OptimizeForUniprocessor off
>>>>> $PreserveFQDN on
>>>>> $WorkDirectory /HSD1/syslogQ
>>>>>
>>>>> $MainMsgQueueFileName mainq-SecInfra
>>>>> $MainMsgQueueMaxDiskSpace 200g
>>>>> $MainMsgQueueMaxFileSize 1g
>>>>> $MainMsgQueueSaveOnShutdown on
>>>>> $MainMsgQueueType LinkedList
>>>>> $MainMsgQueueSize 10485760
>>>>> $MainMsgQueueHighWatermark 10000000
>>>>> $MainMsgQueueWorkerThreads 8
>>>>> $MainMsgDequeueBatchSize 8192
>>>>>
>>>>> module(load="imudp" threads="8" batchSize="8192" timeRequery="10")
>>>>> module(load="imptcp" threads="8" processOnPoller="off" )
>>>>> module(load="imuxsock")
>>>>> module(load="imklog")
>>>>> module(load="immark")
>>>>> module(load="ommysql")
>>>>> module(load="impstats" interval="5" resetCounters="off" format="legacy"
>>>>> log.syslog="off" log.file="/HSD1/locallog/pstat
>>>>> s-SecInfra_2017-11-03.log")
>>>>> module(load="mmutf8fix")
>>>>>
>>>>> $template SecInfraSyslogTmpl,"insert into SecInfraSyslog.syslogRaw
>>>>> (recievedAt,deviceReportedTime,deviceTimezone,fromHost,
>>>>> fromHostIpv4,hostName,facility,priority,localPriority,infoUnitId,
>>>>> inputName,syslogTag,programName,appName,procId,msgId,message
>>>>> ,rawMessage)
>>>>> values ('%timegenerated:::date-mysql%.%timegenerated:::date-
>>>>> subseconds%','%timereported:::date-mysql%.%timereported:::
>>>>> date-subseconds%','%timereported:::date-tzoffsdirection%%timereported:
>>>>> ::date-tzoffshour%','%FROMHOST%','%fromhost-ip%','%
>>>>> HOSTNAME%',%syslogfacility%,%syslogpriority%,'%PRI%',%iut%,
>>>>> '%inputname%','%syslogtag%','%programname%','%APP-NAME%','%
>>>>> PROCID%','%MSGID%','%msg%','%rawmsg%')",SQL
>>>>>
>>>>> ruleset(name="SecInfra"
>>>>>  queue.saveonshutdown="on"
>>>>>  queue.type="LinkedList"
>>>>>  queue.filename="SecInfra-ruleQ"
>>>>>  queue.highwatermark="1000000"
>>>>>  queue.size="1310720"
>>>>>  queue.dequeuebatchsize="8192"
>>>>>  queue.maxdiskspace="200g"
>>>>>  queue.maxfilesize="10g"
>>>>>  queue.workerthreads="2"
>>>>> ) {
>>>>>  action(name="SecInfra2ArchiveDB"
>>>>>    type="ommysql"
>>>>>    server="127.0.0.1"
>>>>>    db="SecInfraSyslog"
>>>>>    uid="rsyslog"
>>>>>    pwd="************"
>>>>>    serverport="3306"
>>>>>    template="SecInfraSyslogTmpl"
>>>>>    queue.saveonshutdown="on"
>>>>>    queue.type="LinkedList"
>>>>>    queue.filename="SecInfraArchive-dbQ"
>>>>>    queue.highwatermark="1000000"
>>>>>    queue.size="1310720"
>>>>>    queue.dequeuebatchsize="8192"
>>>>>    queue.maxdiskspace="200g"
>>>>>    queue.maxfilesize="10g"
>>>>>    queue.workerthreads="2"
>>>>>    action.resumeRetryCount="0"
>>>>>    action.reportSuspension="on"
>>>>>    action.reportSuspensionContinuation="on"
>>>>>  )
>>>>> }
>>>>>
>>>>> input(type="imudp" name="SecInfraUDP" address="************" port="514"
>>>>> ruleset="SecInfra")
>>>>> input(type="imtcp" name="SecInfraTCP" address="************" port="514"
>>>>> ruleset="SecInfra")
>>>>>
>>>>> Regards,
>>>>> Steven Libby
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>>
>>>>>
>>>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to