On 09/06/2018 12:00 PM, Noriko Hosoi via rsyslog wrote:
Thank you for your response, David.
On 09/06/2018 06:18 AM, David Lang wrote:
There is an option for imfile to split a file into multiline messages based on
a regex, that is probably the right starting point.
Yes, we also expected startmsg.regex could be used for our purpose.
https://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html#startmsg-regex
In our case, if we set "/.* stdout P .*/" to startmsg.regex. It also matches the next partial log "<TIMESTAMP_n+1> stdout P partial_log_1" and is considered the first part of the another
multiline message. (Please correct me if I'm wrong...)
<TIMESTAMP_0> stdout F full_log_0
........................
<TIMESTAMP_n-1> stdout F full_log_n-1
<TIMESTAMP_n> stdout P partial_log_0
<TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
<TIMESTAMP_n+3> stdout F full_log_n+3
I thought if there were an optional option, e.g., "endmsg.regex" which
specifies the end of the multiline, we may be able to solve the problem. But again I
could be wrong.
Another issue is when merging the multiline messages into one, we want to so
just with the message part. Using this example:
<TIMESTAMP_n> stdout P partial_log_0
<TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
We want to assemble
<TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
instead of
<TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
Do you think it's doable in imfile?
Even with endmsg.regex support, what we want is to concatenate the _message_
fields together, not the _entire lines_. That is, if the input is this:
<TIMESTAMP_n> stdout P partial_log_0
<TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
We want the final output to be
<TIMESTAMP_n> stdout F partial_log_0partial_log_1rest_of_partial_log
Not
<TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
Another problem is that if imfile is spewing out messages with partial lines in them, there is no guarantee that a subsequent filter is going to see the records in order, due to the
multi-threaded nature of rsyslog - so we need a way to uniquely identify and absolutely order each message. Fortunately, we have 3 message fields to use - $!metadata!filename and
$!metadata!offset uniquely identify the message within each file, and we can use the TIMESTAMP if we need to order messages in files that may have been rotated. Then we'll need to implement
something like https://github.com/fluent-plugins-nursery/fluent-plugin-concat#usage "Handle containerd/cri in Kubernetes" in rsyslog.
Alternately, we could handle this in imfile if we extended imfile to have
support for pluggable parsers. The plugin would provide an entrypoint e.g. a
function parse(char *line)
- imfile reads a line, calls parse(line)
- parser parses the line (which means it will need to call mmnormalize/mmjsonparse . . .), checks for multiline conditions - if multiline, parser returns with some value to imfile indicating
'I need another line'
- parser reads all of the multiple lines, returns the complete message to imfile - that is, it concatenates the "log" fields - not sure what it should do with the "time" field, perhaps use
only the first or last or ??? presumably the "stream" fields all have the same value. It could stick the "time" fields in a local variable e.g. $.imfile!firsttime and $.imfile!lasttime which
are the "time" fields read from the first and last lines in the message, respectively. I'm assuming the user will either want to associate the first time or the last time with the message as
the actual "@timestamp" for the message.
Note that this isn't specific to cri-o - docker --log-driver=json-file logs
have the same problem of having to first parse the log to determine if it is
multiline or not.
Thanks,
--noriko
Note that a LOT of log processing tools assume a log message is a single line,
so you probably want to have newlines escaped in the message before sending it
to other tools for processing.
David Lang
On Wed, 5 Sep 2018, Noriko Hosoi via rsyslog wrote:
Date: Wed, 5 Sep 2018 14:04:42 -0700
From: Noriko Hosoi via rsyslog <rsyslog@lists.adiscon.com>
To: rsyslog@lists.adiscon.com
Cc: Noriko Hosoi <nho...@redhat.com>
Subject: [rsyslog] Question on multiline log messages
Hello, Rsyslog List,
We have a requirement to process multiline log messages in a log file.
The simplified log format looks like this.
<TIMESTAMP_0> stdout F full_log_0
........................
<TIMESTAMP_n-1> stdout F full_log_n-1
<TIMESTAMP_n> stdout P partial_log_0
<TIMESTAMP_n+1> stdout P partial_log_1
<TIMESTAMP_n+2> stdout F rest_of_partial_log
<TIMESTAMP_n+3> stdout F full_log_n+3
In this example, the first "/.* stdout P .*/" tells it's the beginning of the multiline message. The next intermediate line also matches "/.* stdout P .*/", which may not exist or could
repeat until it hits "/.* stdout F .*/". Please note that 'P' stands for Partial; 'F' for Full.
Since the messages are logged in a file, we'd like to use the imfile plugin to
read logs from the file and merge the multiline messages into one line as
follows.
<TIMESTAMP_0> stdout F full_log_0
........................
<TIMESTAMP_n-1> stdout F full_log_n-1
<TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
<TIMESTAMP_n+3> stdout F full_log_n+3
Additionally, the split into the multiline message could occur periodically and
there are multiple log files to be processed simultaneously.
If you are curious, it is the log format from the cri-o container.
Do you happen to have an experience to configure rsyslog to fulfill such
requirements? If you have and could share them with us, we'd greatly
appreciate it.
Thanks,
--noriko
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.