Re: [rsyslog] Tuning Rsyslog Centralized because it use only 1 CPU each time at 100% instead of 4 available

David Lang Wed, 05 Jun 2013 10:28:17 -0700

On Wed, 5 Jun 2013, Roberto Giordani wrote:

Hello David,
your last email was for me very useful and I have appreciated theexplanation !!! :-)


I can't go into a lot of depth right now, so I'll reply more later

I've changed client and server configuration after benchmark test as you cansee, but I'm still suffering on server side1)Attached the two .png about band-with available between client to server(about 200/300 Mbits/s)2)The thread on server that use 100% CPU is MainQ, so I've increased thread #to 20 and now all thread works at 20% using all CPUs

try doing a strace on that thread for a few seconds and see what it's spendingit's time doing

the fact that all threads together aren't passing 100% sounds like you may havesome common bottleneck.

3)I've reduced the number of input file for each client, selecting the mostused and on the server the action and attached there is the rsyslog-pstat ofserver fsp01
4)this is client conf

$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
#$MainMsgQueueWorkerThreads 2
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on

$ActionResumeRetryCount -1    # infinite retries if host is down
$ActionQueueTimeoutEnqueue 1 # after it fills up, also the main queue fillsup$ActionSendResendLastMsgOnReconnect on # the last message is to be resendwhen a connection breaks and has been reconnected.
$ActionQueueCheckpointInterval 1 #

local6.debug @@fsp01:10516

5)this is server conf
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 10514
$InputTCPServerRun 10515
$InputTCPServerRun 10516
#### GLOBAL DIRECTIVES ####
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# File syncing capability is disabled by default. This feature is usually notrequired,
# not useful and an extreme performance hit
$ActionFileEnableSync off

$umask 0007
$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
$MainMsgQueueWorkerThreads 20
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on

And each config file included to server conf has 5 actions


the actions could be the bottleneck.

When I tested, FixedArray (the default) was faster than LinkedList. Yes, it mayuse more memory, but on a dedicated server this is reasonable (and you want totune the max size to avoid paging in any case, and a fixed allocation is easierto check)

My questions today are:
a)each client has a InputFileStateFile, but my application rotate own filesat 00:00. Does rsyslog after logrotate use the new input file empty?

I'm not sure what you are referring to, but I believe that rsyslog will find thenew file and start from the beginning of it.

b)if for some reason I need to start rsyslog client on a new server thatalready have an input file with a size of 300M, is there a way to startlogging only the new line inserted in that file? I think the answer is no :-)


I think that you are correct

c)you said that each client use a thread for an input file, but I see that itis starting to send one file and don't send multiple message from differentinput file
How can increase sent process? The server has 20 threads to receive msg.

I think that what happens is that rsyslog starts on a file and does all the workfor the one file before moving to the next. After it's up to date on each fileit will cycle through them looking for new data

d)Due linkedlist mode for msgqueue and actionqueue I don't want to lose anymsg but if I set InputFilePersistStateInterval to 10 I have a lot of I/O so Ihave set it to 10000 but it is correct?

if you are talking about data on the order of 100,000 logs/sec, set the intervalto at least 10,000 if not 100.000 you don't want to have to write this data outfrequently, every second or every few seconds should be good enough. the reasonfor this file is so that if rsyslog crashes it will know roughly where it leftoff in the file. Under conditions like that, resending a few seconds of a fileseems reasonable to me.

How can I prevent msg lost without using diskqueue?
e)how can "Disabling the disk fallback..." ?

well, if you are getting all your data from files, if you just disable throwingaway messages and retry forever the reading of the file will block if rsyslogcan't deliver the messages, do the files act as your fallback. no need to readthem into rsyslog only to have rsyslog write them back out to disk again(assuming that you fix the problem before you rotate the files twice at least)

Just a note that writng to a file and then reading it is pretty inefficient, ifyou can arrange direct delivery to rsyslog you will probably like it.


David Lang

Thanks a lot for you support.
Regards,
Roberto.

On 06/04/2013 05:53 PM, David Lang wrote:

On Tue, 4 Jun 2013, Roberto Giordani wrote:

Hello David,
according to your suggestion this is my changes:
1) I've removed ActionQueue on client
2)added on server
$MainMsgQueueType LinkedList
$MainMsgQueueSize 100000
$MainMsgQueueWorkerThreads 3

3)the dns lookup was already disabled with -x option on starting rsyslog

4)I can't use UDP because my next step will be to encrypt messages

and this is the pstats report after 4 hours on server
2013-06-04T13:09:26.398752+02:00 fsp01 rsyslogd-pstats: imtcp(10514):submitted=45834092013-06-04T13:09:26.773487+02:00 fsp01 rsyslogd-pstats: main Q: size=9969enqueued=5082093 full=158414 discarded.full=0 discarded.nf=0maxqsize=10000
now the server use 3 of 4 CPU available
I've tested locally the client before forward to server and I don'tobserve any issue.
I would like to know from you some consideration about this parameters:
a)$MainMsgQueueDequeueBatchSize

b)$InputFilePersistStateInterval
-)Which value is correct for previous parameters using a $MainMsgQueueSize1000000 ?


you should not need to change these values from default

I didn't receive an answer about my first question
-)How is possible to increase incoming messages and handle quickly?

lots of things could be the problem, but we need to find out where thebottleneck is in your system

-)how can retrieve the number of messages sent and the number of messagerecieved?

The best way to do this is to setup a benchmark test where you know thenumber of messages that are sent and can measure the number received.

-)do you think using more than one listener on the server is better?


only if the listener is the bottleneck and you have multiple senders.

If the server is now using 3 out of 4 CPUs, then it's possible that one ofthe CPUs is maxing out and that is limiting your overall throughput. With asimple "receive and write to disk" configuration, rsyslog probably onlyuses a few threads, so it may not need the 4th cpu.

whn you look at top with threads enabled (after hitting 'H') on the server,is one of the syslog threads using 100% cpu (or close to it)?


Some of the places you could be having bottlnecks

1. the receiving thread could be bottlnecking, in which case using the ptcpcould be a win

2. activity on the main queue could be the bottleneck, in which caseallowing larger batch sizes could help. Disabling the disk fallback couldhelp, using array instead of linkedlist could help.

3. output to the disk could be the bottlneck, depending on what filesystemyou are using, how large the files get before you rotate them, and whatfsync activity is taking place on the filesystem could all cause problems.Dynamic file names do have some overhead

setup a test file that has a few hundred million log messages in it. Thensee how long it takes to send that file. If you have enough RAM on thecentral box, set your max queue size very large (1M or so), repeat thetest, looking specifically to see if your sending box is finishingnoticably before the receiving box (if so, the bottleneck is in the outputnot the input)

While I understand that you are looking to add encryption to yourconnection in the near future, do a test with UDP instead of TCP just tosee if that is the bottleneck

Sometimes I find it useful to do a strace on threads that are using a lotof cpu as the system calls that are being made let me know what thatparticular thread is working on.


David Lang

Regards,
Roberto.


On 06/04/2013 08:32 AM, David Lang wrote:

On Mon, 3 Jun 2013, Roberto Giordani wrote:


Hello,
I'm working on a project where 20 servers RHEL 5.8 (with rsyslog
5.8.12) has 20 input files on input and send about 10.000 messages for
second to one rsyslog server 7.2 version

The network is gigabit between client-->server and this is the daily
nmon network traffic graphs


The first configuration was with queue file, but is was too slow, so
I've used the LinkedList queue.


in my testing memory was the fastest (around 1000x as fast as a full
disk queue)

The current client settings are
$InputFileName /file1.log
$InputFileTag TEST1
$InputFileStateFile file1
$InputFileSeverity debug
$InputFileFacility local6
$InputRunFileMonitor
$InputFilePersistStateInterval 10
....
....
....
############################################
$MaxMessageSize 9000

$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
$MainMsgQueueWorkerThreads 20
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on

$ActionQueueType LinkedList
$ActionQueueSize 2000000
$ActionQueueWorkerThreads 50
$ActionQueueDequeueBatchSize 5000
$ActionQueueSaveOnShutdown on
###########################################

$ActionResumeRetryCount -1
$ActionQueueTimeoutEnqueue 1
$ActionSendResendLastMsgOnReconnect on
$ActionQueueCheckpointInterval 1

local6.debug @@10.10.1.10:10514

The server settings are
##############################################
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 10514


#### GLOBAL DIRECTIVES ####

# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

# File syncing capability is disabled by default. This feature is
usually not required,
# not useful and an extreme performance hit
$ActionFileEnableSync off

$umask 0007

if $fromhost-ip == "10.10.1.7" and $syslogfacility-text == "local6"
and $syslogseverity-text == "debug" and $syslogtag == "TEST1" then
action(type="omfile" DirOwner="user1" DirCreateMode="0750"
FileCreateMode="0444" File="/rsyslog-data/file1.log")

if $fromhost-ip == ........

###############################

My questions are:
1)how to find the right combination of
Main queue ---> Action Queue---> receiver queue on rsyslog server ?


why do you need an Action Queue? it's only needed if you have some
reason to think that that particular action is going to fall behind and
you want to decouple it from the rest of the outputs.

2)Is it possible to increase incoming messages and handle quickly?


Yes, I've processed 380K logs/sec (gig-e limit) and others with 10G
networks have processed 1M logs/sec

3)Why the output log on Centralized Rsyslog are still behind the
source log on the client?


lots of possible reasons, but first off what are you looking at that's
saying that the central servers are behind?

start off by seeing if you are having problems reading the data from the
file. setup a test where you just read from the file and write back out
to the file (I'd suggest starting without any queue directives at all)

then if that's fast enough, change it to send remotely and report what's
happening

I've activated the pstats module to understand the queue status on
clients and server.
After 6 hours running  the client has this report
.........
2013-06-03T22:23:16.708288+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9327675 full=6831507 maxqsize=20
00000
2013-06-03T22:23:16.708297+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9648448 full=0 maxqsize=20395
2013-06-03T22:23:46.708367+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=323414 ratelimit.discarded=0 ratelimit.numratelimi
ters=29974
2013-06-03T22:23:46.708382+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9340578 full=6844410 maxqsize=20
00000
2013-06-03T22:23:46.708390+02:00 app01 rsyslogd-pstats: main Q:
size=2164 enqueued=9666464 full=0 maxqsize=20395
2013-06-03T22:24:16.708923+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=328198 ratelimit.discarded=0 ratelimit.numratelimi
ters=29986
2013-06-03T22:24:16.708941+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9355649 full=6859481 maxqsize=20
00000
2013-06-03T22:24:16.708949+02:00 app01 rsyslogd-pstats: main Q:
size=1364 enqueued=9686593 full=0 maxqsize=20395
2013-06-03T22:24:46.709300+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=333070 ratelimit.discarded=0 ratelimit.numratelimi
ters=29997
2013-06-03T22:24:46.709316+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9365276 full=6869108 maxqsize=20
00000
2013-06-03T22:24:46.709323+02:00 app01 rsyslogd-pstats: main Q:
size=2123 enqueued=9702047 full=0 maxqsize=20395
2013-06-03T22:25:16.709807+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=337951 ratelimit.discarded=0 ratelimit.numratelimi
ters=30009
2013-06-03T22:25:16.709823+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9379492 full=6883324 maxqsize=20
00000
2013-06-03T22:25:16.709832+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9719723 full=0 maxqsize=20395
2013-06-03T22:25:46.709942+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=343014 ratelimit.discarded=0 ratelimit.numratelimi
ters=30021
2013-06-03T22:25:46.709980+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9389640 full=6893472 maxqsize=20
00000


and the rsyslog server the following stats
.......
2013-06-03T23:05:15.898682+02:00 fsp01 rsyslogd-pstats: main Q:
size=9978 enqueued=5032165 full=156941 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:06:56.157199+02:00 fsp01 rsyslogd-pstats: main Q:
size=9987 enqueued=5065134 full=157971 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:08:30.657673+02:00 fsp01 rsyslogd-pstats: main Q:
size=9972 enqueued=5096315 full=158942 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:10:09.895850+02:00 fsp01 rsyslogd-pstats: main Q:
size=9986 enqueued=5129162 full=159969 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:11:42.488505+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5159935 full=160933 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:13:23.213800+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5193246 full=161973 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:14:58.833570+02:00 fsp01 rsyslogd-pstats: main Q:
size=9970 enqueued=5224922 full=162962 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:16:35.184133+02:00 fsp01 rsyslogd-pstats: main Q:
size=9975 enqueued=5256863 full=163960 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:18:13.992958+02:00 fsp01 rsyslogd-pstats: main Q:
size=9991 enqueued=5289392 full=164977 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:19:52.464473+02:00 fsp01 rsyslogd-pstats: main Q:
size=9942 enqueued=5322013 full=165996 discarded.full=0 discarded.nf=0
maxqsize=10000

4)Why on the client enqueued value never decrease and full= is always
different of 0?


well, once you have enqueued something, you don't go back in time and
not enqueue it, so unless you restart rsyslog, that number is going to
just keep climbing.

I'm planning to distribute the 20 client to 2 process of  rsyslog
server on different port on the same server but I think your help
about the right combination of action queue on client and main queue
on server.

On rsyslog documentation I've found a lot of info about to handle
input queue (# thread, max messages and DequeuBatch) but I dont' know
how to rsyslog should better work on receiver server.
I've seen that the rsyslog 7.2 process on server use only one CPU each
time, while the server has 4CPU and 12GB ram on 64bit O.S.
5)Some option during configuration process can change this behavior ?
$ ./configure --prefix=/usr/local/rsyslog7 --enable-gnutls
--enable-imfile --enable-omruleset --enable-imptcp
--enable-mmnormalize --enable-usertools --enable-imdiag
--enable-diagtools --enable-impstats


I think you are trying too hard.

rsyslog will use one thread for each input, and one thread for each
output by default (and possibly some others along the way if it sees a
need), the fact that you aren't seeing it use more than 100% CPU
indicates that you probably have some other bottleneck.

On the recieving side, have you disabled DNS lookups? that has a big hit
in many cases.

when you look at top of the central box, hit 'H' to have it display the
different threads as different lines, then you will see how much cpu is
used for each one. After that you can track down which one is the
bottleneck

David Lang

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Tuning Rsyslog Centralized because it use only 1 CPU each time at 100% instead of 4 available

Reply via email to