Thanks David & RobertM for your replies. impstats definitely looks like it 
would help me understand what's going on - will use that tomorrow and repeat 
the load tests.



As for rsyslog, using version 8.2.1 on RHEL 6.5 64bit.



When you say disable DNS look ups, you mean running with -x option, yes?



As for what i'm doing with logs, it goes like this (pulling out the relevant 
parts from multiple files and internal hostnames sanitized):



input(type="imudp" port="514" ruleset="MainRoutingEngine")



ruleset(name="MainRoutingEngine"

        queue.type="fixedArray"

        queue.size="250000"

        queue.dequeueBatchSize="4096"

        queue.workerThreads="4"

        queue.workerThreadMinimumMessages="60000"

        ) {

        call DataArchiving

        #call DataForwarding

}



ruleset(name="DataArchiving"

        queue.type="fixedArray"

        queue.size="250000"

        queue.dequeueBatchSize="4096"

        queue.workerThreads="4"

        queue.workerThreadMinimumMessages="60000"

        ) {



        action( type="omfile"

                file="/data/logs/incoming-all.log"

                ioBufferSize="64k"

                flushOnTXEnd="off"

                asyncWriting="on")



        if $syslogtag == 'LEEF:' and $msg contains 'Websense' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthWebsense"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $rawmsg contains '_SIEM' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoWSASquid"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $rawmsg contains 'SFIMS:' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthSourcefire"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $rawmsg contains 'fenotify-' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthFireeye"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $hostname == 'server.com' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthTippingpoint"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if re_match($rawmsg, 'a|b|c|d' ) then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthJuniperSSLVPN"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $syslogtag startswith 'MSWinEventLog' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthWindows"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $rawmsg contains 'RT_FLOW: RT_FLOW_SESSION' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthSRXFW"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if re_match($msg, '%ASA-[0-9]-[0-9]{6}' ) then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoASA"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if re_match($rawmsg, '%PIX-[0-9]-[0-9]{6}' ) then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoPIX"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if re_match($rawmsg, '%FWSM-[0-9]-[0-9]{6}' ) then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoFWSM"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $msg contains '%SEC-6-IPACCESSLOG' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoACLRouter"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        if $rawmsg contains 'CSCOacs' then {

                action( type="omfile"

                        dynaFile="sortByHostnameYearMonthCiscoACS"

                        ioBufferSize="64k"

                        flushOnTXEnd="off"

                        asyncWriting="on"

                        )

                stop

        }



        action( type="omfile"

                dynaFile="sortByHostnameYearMonthLog"

                ioBufferSize="64k"

                flushOnTXEnd="off"

                asyncWriting="on")

}



The files are written to a glusterfs file system sitting on top of XFS on a 
iSCSI device. iSCSI has its own dedicated interface (eth1) and so does 
glusterfs (eth2) that is separate from the incoming syslog traffic (eth0). 
iostat output during the load test showed no strain on the disk i/o sub-system.



Thanks,

-Bond



-----Original Message-----

From: [email protected] 
[mailto:[email protected]] On Behalf Of David Lang

Sent: Wednesday, July 02, 2014 11:06 PM

To: rsyslog-users

Subject: Re: [rsyslog] question on reliability of message processing



On Wed, 2 Jul 2014, Masuda, Bond wrote:



> hi rsyslogers:

>

> we've been doing some load testing of syslog messages over UDP/514 to 
> rsyslog. we write all incoming messages to a file in 
> /data/logs/incoming-all.log.

>

> In our load test, we generated about 29 million messages in 300 seconds. On 
> the server side, we are receiving about 25 million messages; and about 4 
> million messages are lost on the network (not an rsyslog issue). However, of 
> the 25 million messages we know arrive at the server, we are also seeing 
> message lost in /data/logs/incoming-all.log, albeit to a much lesser degree 
> than the network problem.

>

> The actual numbers are:

>

> 29,561,113 messages generated and sent in 300 seconds

> 24,802,441 messages arrive at the rsyslog server (counting UDP packets via 
> NETFILTER/mangle-PREROUTING accounting rule)

> 24,774,587 messages written to /data/logs/incoming-all.log

>

> So it would seem that we lost 27854 messages within rsyslog.

>

> My question is this:

>

> 1. Does rsyslog drop messages when its message queues are overflowing?



Yes, if the queue is full, what else can rsyslog do?



> 2. If answer to #1 is yes, does it keep any accounting of the lost messages 
> and how can I see those numbers? or at least warn that its queues are 
> overflowing?



yes, enable impstats



> 3. if answer to #1 is yes, is there some configuration setting to make 
> rsyslog guarantee not to drop messages, potentially as trade off with some 
> other problem? Or is it just a matter of increasing queue sizes?



not with UDP, because if rsyslog can't deal with the message _NOW_ the OS will

drop the packet



> 4. If answer to #1 is no, what's the best way to go about troubleshooting why 
> messages are being lost?





> BTW, under less stressful conditions, all the in/out numbers perfectly match.

> We only start seeing "lost messages/packets" when we go above ~50,000 messages

> per second.



first question, what version are you running, 8.2 is worlds better than 5.x



after that, have you tried disabling DNS lookups



run tcpdump (saving to a file, not outputting to the screen) to check if the

packets are really being lost on the network or if they are being lost in the

network stack of the receiving machine)



configure impstats to see what your queue sizes look like



what are you doing with the logs? it's possible that your bottleneck is in

outputting the logs and fixing that will solve your problem



you could configure disk assisted queues to spool logs to disk, but that is

slower than just outputting them to a file, so it doesn't solve the problem



a tuned rsyslog setup can do gig-E wire speed or faster, you are well below that

limit, so we should be able to help you.



David Lang

_______________________________________________

rsyslog mailing list

http://lists.adiscon.net/mailman/listinfo/rsyslog

http://www.rsyslog.com/professional-services/

What's up with rsyslog? Follow https://twitter.com/rgerhards

NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to