On Wed, 5 Jun 2013, Roberto Giordani wrote:
Hello David,
your last email was for me very useful and I have appreciated the
explanation !!! :-)
I can't go into a lot of depth right now, so I'll reply more later
I've changed client and server configuration after benchmark test as you can
see, but I'm still suffering on server side
1)Attached the two .png about band-with available between client to server
(about 200/300 Mbits/s)
2)The thread on server that use 100% CPU is MainQ, so I've increased thread #
to 20 and now all thread works at 20% using all CPUs
try doing a strace on that thread for a few seconds and see what it's spending
it's time doing
the fact that all threads together aren't passing 100% sounds like you may have
some common bottleneck.
3)I've reduced the number of input file for each client, selecting the most
used and on the server the action and attached there is the rsyslog-pstat of
server fsp01
4)this is client conf
$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
#$MainMsgQueueWorkerThreads 2
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on
$ActionResumeRetryCount -1 # infinite retries if host is down
$ActionQueueTimeoutEnqueue 1 # after it fills up, also the main queue fills
up
$ActionSendResendLastMsgOnReconnect on # the last message is to be resend
when a connection breaks and has been reconnected.
$ActionQueueCheckpointInterval 1 #
local6.debug @@fsp01:10516
5)this is server conf
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 10514
$InputTCPServerRun 10515
$InputTCPServerRun 10516
#### GLOBAL DIRECTIVES ####
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# File syncing capability is disabled by default. This feature is usually not
required,
# not useful and an extreme performance hit
$ActionFileEnableSync off
$umask 0007
$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
$MainMsgQueueWorkerThreads 20
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on
And each config file included to server conf has 5 actions
the actions could be the bottleneck.
When I tested, FixedArray (the default) was faster than LinkedList. Yes, it may
use more memory, but on a dedicated server this is reasonable (and you want to
tune the max size to avoid paging in any case, and a fixed allocation is easier
to check)
My questions today are:
a)each client has a InputFileStateFile, but my application rotate own files
at 00:00. Does rsyslog after logrotate use the new input file empty?
I'm not sure what you are referring to, but I believe that rsyslog will find the
new file and start from the beginning of it.
b)if for some reason I need to start rsyslog client on a new server that
already have an input file with a size of 300M, is there a way to start
logging only the new line inserted in that file? I think the answer is no :-)
I think that you are correct
c)you said that each client use a thread for an input file, but I see that it
is starting to send one file and don't send multiple message from different
input file
How can increase sent process? The server has 20 threads to receive msg.
I think that what happens is that rsyslog starts on a file and does all the work
for the one file before moving to the next. After it's up to date on each file
it will cycle through them looking for new data
d)Due linkedlist mode for msgqueue and actionqueue I don't want to lose any
msg but if I set InputFilePersistStateInterval to 10 I have a lot of I/O so I
have set it to 10000 but it is correct?
if you are talking about data on the order of 100,000 logs/sec, set the interval
to at least 10,000 if not 100.000 you don't want to have to write this data out
frequently, every second or every few seconds should be good enough. the reason
for this file is so that if rsyslog crashes it will know roughly where it left
off in the file. Under conditions like that, resending a few seconds of a file
seems reasonable to me.
How can I prevent msg lost without using diskqueue?
e)how can "Disabling the disk fallback..." ?
well, if you are getting all your data from files, if you just disable throwing
away messages and retry forever the reading of the file will block if rsyslog
can't deliver the messages, do the files act as your fallback. no need to read
them into rsyslog only to have rsyslog write them back out to disk again
(assuming that you fix the problem before you rotate the files twice at least)
Just a note that writng to a file and then reading it is pretty inefficient, if
you can arrange direct delivery to rsyslog you will probably like it.
David Lang
Thanks a lot for you support.
Regards,
Roberto.
On 06/04/2013 05:53 PM, David Lang wrote:
On Tue, 4 Jun 2013, Roberto Giordani wrote:
Hello David,
according to your suggestion this is my changes:
1) I've removed ActionQueue on client
2)added on server
$MainMsgQueueType LinkedList
$MainMsgQueueSize 100000
$MainMsgQueueWorkerThreads 3
3)the dns lookup was already disabled with -x option on starting rsyslog
4)I can't use UDP because my next step will be to encrypt messages
and this is the pstats report after 4 hours on server
2013-06-04T13:09:26.398752+02:00 fsp01 rsyslogd-pstats: imtcp(10514):
submitted=4583409
2013-06-04T13:09:26.773487+02:00 fsp01 rsyslogd-pstats: main Q: size=9969
enqueued=5082093 full=158414 discarded.full=0 discarded.nf=0
maxqsize=10000
now the server use 3 of 4 CPU available
I've tested locally the client before forward to server and I don't
observe any issue.
I would like to know from you some consideration about this parameters:
a)$MainMsgQueueDequeueBatchSize
b)$InputFilePersistStateInterval
-)Which value is correct for previous parameters using a $MainMsgQueueSize
1000000 ?
you should not need to change these values from default
I didn't receive an answer about my first question
-)How is possible to increase incoming messages and handle quickly?
lots of things could be the problem, but we need to find out where the
bottleneck is in your system
-)how can retrieve the number of messages sent and the number of message
recieved?
The best way to do this is to setup a benchmark test where you know the
number of messages that are sent and can measure the number received.
-)do you think using more than one listener on the server is better?
only if the listener is the bottleneck and you have multiple senders.
If the server is now using 3 out of 4 CPUs, then it's possible that one of
the CPUs is maxing out and that is limiting your overall throughput. With a
simple "receive and write to disk" configuration, rsyslog probably only
uses a few threads, so it may not need the 4th cpu.
whn you look at top with threads enabled (after hitting 'H') on the server,
is one of the syslog threads using 100% cpu (or close to it)?
Some of the places you could be having bottlnecks
1. the receiving thread could be bottlnecking, in which case using the ptcp
could be a win
2. activity on the main queue could be the bottleneck, in which case
allowing larger batch sizes could help. Disabling the disk fallback could
help, using array instead of linkedlist could help.
3. output to the disk could be the bottlneck, depending on what filesystem
you are using, how large the files get before you rotate them, and what
fsync activity is taking place on the filesystem could all cause problems.
Dynamic file names do have some overhead
setup a test file that has a few hundred million log messages in it. Then
see how long it takes to send that file. If you have enough RAM on the
central box, set your max queue size very large (1M or so), repeat the
test, looking specifically to see if your sending box is finishing
noticably before the receiving box (if so, the bottleneck is in the output
not the input)
While I understand that you are looking to add encryption to your
connection in the near future, do a test with UDP instead of TCP just to
see if that is the bottleneck
Sometimes I find it useful to do a strace on threads that are using a lot
of cpu as the system calls that are being made let me know what that
particular thread is working on.
David Lang
Regards,
Roberto.
On 06/04/2013 08:32 AM, David Lang wrote:
On Mon, 3 Jun 2013, Roberto Giordani wrote:
Hello,
I'm working on a project where 20 servers RHEL 5.8 (with rsyslog
5.8.12) has 20 input files on input and send about 10.000 messages for
second to one rsyslog server 7.2 version
The network is gigabit between client-->server and this is the daily
nmon network traffic graphs
The first configuration was with queue file, but is was too slow, so
I've used the LinkedList queue.
in my testing memory was the fastest (around 1000x as fast as a full
disk queue)
The current client settings are
$InputFileName /file1.log
$InputFileTag TEST1
$InputFileStateFile file1
$InputFileSeverity debug
$InputFileFacility local6
$InputRunFileMonitor
$InputFilePersistStateInterval 10
....
....
....
############################################
$MaxMessageSize 9000
$MainMsgQueueType LinkedList
$MainMsgQueueSize 1000000
$MainMsgQueueWorkerThreads 20
$MainMsgQueueDequeueBatchSize 5000
$MainMsgQueueSaveOnShutdown on
$ActionQueueType LinkedList
$ActionQueueSize 2000000
$ActionQueueWorkerThreads 50
$ActionQueueDequeueBatchSize 5000
$ActionQueueSaveOnShutdown on
###########################################
$ActionResumeRetryCount -1
$ActionQueueTimeoutEnqueue 1
$ActionSendResendLastMsgOnReconnect on
$ActionQueueCheckpointInterval 1
local6.debug @@10.10.1.10:10514
The server settings are
##############################################
# Provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 10514
#### GLOBAL DIRECTIVES ####
# Use default timestamp format
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
# File syncing capability is disabled by default. This feature is
usually not required,
# not useful and an extreme performance hit
$ActionFileEnableSync off
$umask 0007
if $fromhost-ip == "10.10.1.7" and $syslogfacility-text == "local6"
and $syslogseverity-text == "debug" and $syslogtag == "TEST1" then
action(type="omfile" DirOwner="user1" DirCreateMode="0750"
FileCreateMode="0444" File="/rsyslog-data/file1.log")
if $fromhost-ip == ........
###############################
My questions are:
1)how to find the right combination of
Main queue ---> Action Queue---> receiver queue on rsyslog server ?
why do you need an Action Queue? it's only needed if you have some
reason to think that that particular action is going to fall behind and
you want to decouple it from the rest of the outputs.
2)Is it possible to increase incoming messages and handle quickly?
Yes, I've processed 380K logs/sec (gig-e limit) and others with 10G
networks have processed 1M logs/sec
3)Why the output log on Centralized Rsyslog are still behind the
source log on the client?
lots of possible reasons, but first off what are you looking at that's
saying that the central servers are behind?
start off by seeing if you are having problems reading the data from the
file. setup a test where you just read from the file and write back out
to the file (I'd suggest starting without any queue directives at all)
then if that's fast enough, change it to send remotely and report what's
happening
I've activated the pstats module to understand the queue status on
clients and server.
After 6 hours running the client has this report
.........
2013-06-03T22:23:16.708288+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9327675 full=6831507 maxqsize=20
00000
2013-06-03T22:23:16.708297+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9648448 full=0 maxqsize=20395
2013-06-03T22:23:46.708367+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=323414 ratelimit.discarded=0 ratelimit.numratelimi
ters=29974
2013-06-03T22:23:46.708382+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9340578 full=6844410 maxqsize=20
00000
2013-06-03T22:23:46.708390+02:00 app01 rsyslogd-pstats: main Q:
size=2164 enqueued=9666464 full=0 maxqsize=20395
2013-06-03T22:24:16.708923+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=328198 ratelimit.discarded=0 ratelimit.numratelimi
ters=29986
2013-06-03T22:24:16.708941+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9355649 full=6859481 maxqsize=20
00000
2013-06-03T22:24:16.708949+02:00 app01 rsyslogd-pstats: main Q:
size=1364 enqueued=9686593 full=0 maxqsize=20395
2013-06-03T22:24:46.709300+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=333070 ratelimit.discarded=0 ratelimit.numratelimi
ters=29997
2013-06-03T22:24:46.709316+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9365276 full=6869108 maxqsize=20
00000
2013-06-03T22:24:46.709323+02:00 app01 rsyslogd-pstats: main Q:
size=2123 enqueued=9702047 full=0 maxqsize=20395
2013-06-03T22:25:16.709807+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=337951 ratelimit.discarded=0 ratelimit.numratelimi
ters=30009
2013-06-03T22:25:16.709823+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9379492 full=6883324 maxqsize=20
00000
2013-06-03T22:25:16.709832+02:00 app01 rsyslogd-pstats: main Q: size=2
enqueued=9719723 full=0 maxqsize=20395
2013-06-03T22:25:46.709942+02:00 app01 rsyslogd-pstats: imuxsock:
submitted=343014 ratelimit.discarded=0 ratelimit.numratelimi
ters=30021
2013-06-03T22:25:46.709980+02:00 app01 rsyslogd-pstats: action 9
queue: size=2000000 enqueued=9389640 full=6893472 maxqsize=20
00000
and the rsyslog server the following stats
.......
2013-06-03T23:05:15.898682+02:00 fsp01 rsyslogd-pstats: main Q:
size=9978 enqueued=5032165 full=156941 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:06:56.157199+02:00 fsp01 rsyslogd-pstats: main Q:
size=9987 enqueued=5065134 full=157971 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:08:30.657673+02:00 fsp01 rsyslogd-pstats: main Q:
size=9972 enqueued=5096315 full=158942 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:10:09.895850+02:00 fsp01 rsyslogd-pstats: main Q:
size=9986 enqueued=5129162 full=159969 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:11:42.488505+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5159935 full=160933 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:13:23.213800+02:00 fsp01 rsyslogd-pstats: main Q:
size=9973 enqueued=5193246 full=161973 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:14:58.833570+02:00 fsp01 rsyslogd-pstats: main Q:
size=9970 enqueued=5224922 full=162962 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:16:35.184133+02:00 fsp01 rsyslogd-pstats: main Q:
size=9975 enqueued=5256863 full=163960 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:18:13.992958+02:00 fsp01 rsyslogd-pstats: main Q:
size=9991 enqueued=5289392 full=164977 discarded.full=0 discarded.nf=0
maxqsize=10000
2013-06-03T23:19:52.464473+02:00 fsp01 rsyslogd-pstats: main Q:
size=9942 enqueued=5322013 full=165996 discarded.full=0 discarded.nf=0
maxqsize=10000
4)Why on the client enqueued value never decrease and full= is always
different of 0?
well, once you have enqueued something, you don't go back in time and
not enqueue it, so unless you restart rsyslog, that number is going to
just keep climbing.
I'm planning to distribute the 20 client to 2 process of rsyslog
server on different port on the same server but I think your help
about the right combination of action queue on client and main queue
on server.
On rsyslog documentation I've found a lot of info about to handle
input queue (# thread, max messages and DequeuBatch) but I dont' know
how to rsyslog should better work on receiver server.
I've seen that the rsyslog 7.2 process on server use only one CPU each
time, while the server has 4CPU and 12GB ram on 64bit O.S.
5)Some option during configuration process can change this behavior ?
$ ./configure --prefix=/usr/local/rsyslog7 --enable-gnutls
--enable-imfile --enable-omruleset --enable-imptcp
--enable-mmnormalize --enable-usertools --enable-imdiag
--enable-diagtools --enable-impstats
I think you are trying too hard.
rsyslog will use one thread for each input, and one thread for each
output by default (and possibly some others along the way if it sees a
need), the fact that you aren't seeing it use more than 100% CPU
indicates that you probably have some other bottleneck.
On the recieving side, have you disabled DNS lookups? that has a big hit
in many cases.
when you look at top of the central box, hit 'H' to have it display the
different threads as different lines, then you will see how much cpu is
used for each one. After that you can track down which one is the
bottleneck
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.