On Fri, Sep 27, 2013 at 3:08 AM, David Lang <[email protected]> wrote:
> did you capture top during this time?
>
> since you are writing to NFS, iostat doesn't give us any useful info
>
If Robert followed my request, he is writing to local disk. And the iostats
indeed look like this (scroll a bit down, you'll see even periods of 100%
utilization).
> at this point the queues is not showing that it ever filled up:
>
> Thu Sep 26 15:30:40 2013: imuxsock: submitted=6 ratelimit.discarded=0
> ratelimit.numratelimiters=6
> Thu Sep 26 15:30:40 2013: action 1: processed=72558080 failed=0
> Thu Sep 26 15:30:40 2013: action 2: processed=14243880 failed=0
> Thu Sep 26 15:30:40 2013: action 3: processed=15743143 failed=0
> Thu Sep 26 15:30:40 2013: action 4: processed=11382168 failed=0
> Thu Sep 26 15:30:40 2013: action 5: processed=9334973 failed=0
> Thu Sep 26 15:30:40 2013: action 6: processed=11212392 failed=0
> Thu Sep 26 15:30:40 2013: action 7: processed=11048225 failed=0
> Thu Sep 26 15:30:40 2013: action 8: processed=11177134 failed=0
> Thu Sep 26 15:30:40 2013: imudp(*:514): submitted=160186128
> Thu Sep 26 15:30:40 2013: imudp(*:514): submitted=0
> Thu Sep 26 15:30:40 2013: main Q: size=3487134 enqueued=160186144 full=0
> discarded.full=0 discarded.nf=0 maxqsize=7742133
>
> now, one thing is that it looks like at the time you cut off the report,
> rsyslog has not finished processing the message, there are still
> size=3487134 messages in the queue, which is ~10 seconds worth of data
>
> this shows everything working fine without a problem (once the logs are in
> rsyslog), the peak queue size is about double what the current size is at
> the end of the test, but it's still only about 1/3 the max queue size. This
> means that we are no longer in the situation where the output and queue
> management is out problem (which we were at one point earlier where the
> queues were filling up)
>
yes and no ... for this config, I wouldn't expect a real queue buildup.
After all, it shows that the current config cannot run at this sustained
traffic rate. Sooner or later the queue would be overrun. We've just moved
that point.
>
> how many messages did you send in this test? Do you know what the logs
> that Spirent is sending look like over the wire? (if not, could you capture
> a sample of each log with the template RSYSLOG_DebugFormat so that we can
> see exactly what is being sent)
>
>
> There are two different performance numbers that are interesting to know
>
> 1. what is the max rate that rsyslog can accept messages without loosing
> any, assuming there is available queue.
>
> 2. what is the max rate that rsyslog can process and output messages
>
> With a lossy protocol like UDP, the first rate needs to be significantly
> higher than the second rate. In the real world, logs are very bursty, but
> if you average the log throughput over even a 10 second window, the average
> logs/sec drops drastically.
>
> when benchmarking, you need to make sure that the test runs long enough to
> get the system to a steady state. Given the fact that the max queue size is
> still increasing at the end of your sample time, I'm not sure the test is
> running long enough to show this
>
IMHO the buildup itself is good enough indication that we cannot keep this
rate at steady state (thus the impstats logs are so useful, they show even
hidden problems).
>
>
> the grep testing that you are doing could be showing either one (depending
> on if the timestamp that you are using for your grep is in the log message
> provided to rsyslog, or if this is the timestamp that rsyslog is putting on
> the message.
>
TBH, I am not at all interested in that number[at this time], as it tells
us less than impstats ;)
>
> given that this latest run shows no sign of the queue filling up, I
> suspect that this timestamp is created by rsyslog. do you know exactly what
> the spirent is sending out? I suspect it's not including the timestamp
>
>
>
> At the risk of confusing things, in future tests, please do the following.
>
> stop rsyslog, rm all output files, start rsyslog, start top, start
> generating traffic (known number of packets at a known rate), wait until
> the system goes idle, if you can, note how long after the spirent stops
> sending traffic it takes for rsyslog to become idle, stop top, stop rsyslog.
>
> if the test is running for multiple minutes, we may want to setup a
> process that rolls the log files so we can see for sure how many are output
> in each minute, but let's hold off on discussing the details of this a bit.
>
> report samples from top during the flood, and after spirent stopped
> sending messages.
>
> count the total number of log messages in these files and compare them to
> what should have been there (i.e. what spirent sent)
>
> go ahead and do some sample greps for timestamps, or even better
>
> cut -c -16 * |sort -n |uniq -c
>
> This will show per-second counts. you can then decide if you want to send
> us the full dump, or just a sample :-)
>
> some of this you can do retroactively for this last test, some you must
> plan ahead for :-)
>
I see where you are coming from, but I think we can keep simpler for the
moment. There is lot out get out of impstats, so a simpler environment is
able to tell us about exceeding the sustained rate. But I agree that for a
full test, what you say is necessary. I just think we currently need no
full test.
>
>
> There is one imudp option that you are not using, batchsize, this requires
> a kernel >= 2.6.34 and glibc >= 2.12, so if you are using anything older
> this won't make any difference. What OS version are your running again?
>
> If you are running something new enough, run this same config with
> batchsize="10" right after TimeRequery="10" and let's see what difference
> that makes (changing nothing else in the config)
>
Please don't do at the moment. I have a test plan that will lead me to
useful results, and any config change other than the simple steps I have on
my mind will invalidate it. I have not fully conveyed this so far as I
feared too much would happen in parallel. Thus I am doing the (necessary!)
baby-steps. I'll send a follow-up note (to untangle the conversation a bit)
with a rough description and the next test I would like to see carried out.
Note that
Rainer
>
> David Lang
>
>
> ~On Thu, 26 Sep 2013, Robert wrote:
>
> Hello Rainer, David,
>>
>> The spirent is set to send out 300k fps. Attached are the pstats,
>> mpstats, iostats., and its writing locally, this is what I get back:
>>
>> [rcortiz@simon test]$ ls
>> f_ad f_fw f_mail f_networks f_pix f_router f_shib f_vm
>> [rcortiz@simon test]$ sudo grep 15:20:00 * | wc -l
>> 259910
>> [rcortiz@simon test]$ sudo grep 15:20:10 * | wc -l
>> 280150
>>
>
> pasting in the config again here (since your mail client mangles quoted
> text, loosing line breaks)
>
>
> #### MODULES ####
>
> module(load="imuxsock") # needs to be done just once Roberto 8-9-13
> module(load="imklog")
> module(load="impstats" interval="30" log.syslog="off"
> log.file="/local_path/pstats")
>
> # Provides UDP syslog reception
> module(load="imudp"
> TimeRequery="10"
> SchedulingPolicy="fifo"
> SchedulingPriority="10") # needs to be done just once Roberto 8-9-13
> input(type="imudp" port="514") # Roberto 8-9-13
>
> #### GLOBAL DIRECTIVES ####
>
> # Use imprecise-old-style timestamp format
>
> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
>
> $MainMsgQueueType FixedArray
> $MainMsgQueueSize 20000000
> $MainMsgQueueWorkerThreads 2
> $MainMsgQueueDequeueBatchSize 1000
> $InputUDPMaxSessions 40000000
>
> #### RULES ####
>
> :hostname, contains, "fdfw" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_fw"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "mail" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_mail"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "shib" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_shib"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "pdc" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_ad"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "networks" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_networks"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "rout" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_router"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "vm" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_vm"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> :hostname, contains, "pix" {
> action (type="omfile"
> DirCreateMode="0700"
> FileCreateMode="0644"
> File="/local_path/f_pix"
> FlushOnTXEnd="on"
> IOBufferSize="8k"
> )
> }
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.