On 11/21/2012 02:59 PM, Rainer Gerhards wrote:
-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of Risto Vaarandi
Sent: Wednesday, November 21, 2012 1:25 PM
To: [email protected]
Subject: [rsyslog] rsyslog message formatting for elasticsearch

hi all,

I apologize in advance if this question has been asked before. I have
been playing with omelasticsearch module recently and it works nice for
me. According to my tests, its performance is clearly superior to some
Java tools which have been used for Elasticsearch in the past.
I began experimenting with a configuration given at
http://wiki.rsyslog.com/index.php/HOWTO:_rsyslog_%2B_elasticsearch
and tried to elaborate it a more advanced configuration.

My question is about advanced parsing of log messages and extracting
additional fields from message content. In my environment, I have a lot
of IDS alarms, e.g.,

Nov 21 12:31:41 myhost snort[17449]: [1:2014527:1] ET CURRENT_EVENTS
Exploit Kit Delivering Compressed Flash Content to Client
[Classification: Potentially Bad Traffic] [Priority: 2] {TCP}
10.1.1.1:80 ->  10.2.2.2:51601

I would like to extract some fields like signature ID, transport
protocol, source IP, etc. from each alarm, create a json record, and
write it into elasticsearch database.

In order to address the problem of data extraction, I have used rsyslog
property replacers. For example, to extract the main message fields
plus
the signature and protocol ID, this template could be used:

$template
SnortTemplate,"{\"timestamp\":\"%timereported:::date-
rfc3339%\",\"message\":\"%msg:::json%\",\"host\":\"%HOSTNAME:::json%\",
\"sig\":\"%msg:R,ERE,1:\[([0-9]+:[0-9]+):[0-9]+\]--
end%\",\"proto\":\"%msg:R,ERE,1:\{([A-Z]+)\}
[0-9.]+--end%\"}"

The signature extraction is done with
%msg:R,ERE,1:\[([0-9]+:[0-9]+):[0-9]+\]--end%

while protocol is extracted with
%msg:R,ERE,1:\{([A-Z]+)\} [0-9.]+--end%

However, while this approach works for me, it requires a separate
regular expression match for each additional field. My question is --
are there any better ways for accomplishing this task?


You should look into mmnormalize (based on liblognorm). This is a classical use 
case for it. Some doc:

http://www.rsyslog.com/doc/mmnormalize.html
http://www.rsyslog.com/tag/mmnormalize/

Another question -- for writing into elasticsearch, I've used
omelasticsearch with 'bulkmode' enabled and queue batch sizes set to
higher values:

$MainMsgQueueDequeueBatchSize 1024
$ActionQueueDequeueBatchSize 512

$template SnortIndex,"rsyslog-%timereported:1:10:date-rfc3339%"

if $programname == 'snort' then        action(type="omelasticsearch"
template="SnortPayload" dynSearchIndex="on" searchIndex="SnortIndex"
server="localhost" bulkmode="on")

Are there any other ways for increasing the throughput?

It depends a bit on the overall workflow, but what I see doesn't look bad. I'd 
probably even increase the batch sizes more, if you have a heavy use system. 
10240 for both is not evil.

Rainer

Rainer,
thanks for feedback!

I have looked into mmnormalize and done some testing with lognormalizer. From the documentation, I've got an impression that for matching free form text up to a certain delimiting character, the char-to:character field type has to be used. (It is somewhat similar to non-greedy .*? in pcre dialect.)

However, in some more complex cases the delimiter can also appear inside the free form text, and then some more complex check has to be carried out for characters that follow the delimiter. Regular expressions allow for this kind of matching -- for example, '.*?:[0-9]+' can look for the next appearance of colon, if the current appearance is not followed by sequence of digits. From tests with lognormalizer, I have got an impression that char-to:character will always settle for the first instance of the given character.

Are there any ways for implementing more complex parsing with a search for an alternative solution, if the first attempt fails? Is there any support for regular expressions in mmnormalize?

with kind regards,
risto

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to