On 09/06/2018 12:00 PM, Noriko Hosoi via rsyslog wrote:
Thank you for your response, David.

On 09/06/2018 06:18 AM, David Lang wrote:
There is an option for imfile to split a file into multiline messages based on 
a regex, that is probably the right starting point.
Yes, we also expected startmsg.regex could be used for our purpose.
https://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html#startmsg-regex

In our case, if we set "/.* stdout P .*/" to startmsg.regex.  It also matches the next partial log "<TIMESTAMP_n+1> stdout P partial_log_1" and is considered the first part of the another multiline message.  (Please correct me if I'm wrong...)
  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

I thought if there were an optional option, e.g., "endmsg.regex" which 
specifies the end of the multiline, we may be able to solve the problem.  But again I 
could be wrong.

Another issue is when merging the multiline messages into one, we want to so 
just with the message part.  Using this example:
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
We want to assemble
  <TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
instead of
  <TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1 
<TIMESTAMP_n+2> stdout F rest_of_partial_log

Do you think it's doable in imfile?

Even with endmsg.regex support, what we want is to concatenate the _message_ 
fields together, not the _entire lines_.  That is, if the input is this:

  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log

We want the final output to be

  <TIMESTAMP_n> stdout F partial_log_0partial_log_1rest_of_partial_log

Not

  <TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1 
<TIMESTAMP_n+2> stdout F rest_of_partial_log

Another problem is that if imfile is spewing out messages with partial lines in them, there is no guarantee that a subsequent filter is going to see the records in order, due to the multi-threaded nature of rsyslog - so we need a way to uniquely identify and absolutely order each message.  Fortunately, we have 3 message fields to use - $!metadata!filename and $!metadata!offset uniquely identify the message within each file, and we can use the TIMESTAMP if we need to order messages in files that may have been rotated.  Then we'll need to implement something like https://github.com/fluent-plugins-nursery/fluent-plugin-concat#usage "Handle containerd/cri in Kubernetes" in rsyslog.

Alternately, we could handle this in imfile if we extended imfile to have 
support for pluggable parsers.  The plugin would provide an entrypoint e.g. a 
function parse(char *line)

- imfile reads a line, calls parse(line)
- parser parses the line (which means it will need to call mmnormalize/mmjsonparse . . .), checks for multiline conditions - if multiline, parser returns with some value to imfile indicating 'I need another line' - parser reads all of the multiple lines, returns the complete message to imfile - that is, it concatenates the "log" fields - not sure what it should do with the "time" field, perhaps use only the first or last or ??? presumably the "stream" fields all have the same value.  It could stick the "time" fields in a local variable e.g. $.imfile!firsttime and $.imfile!lasttime which are the "time" fields read from the first and last lines in the message, respectively.  I'm assuming the user will either want to associate the first time or the last time with the message as the actual "@timestamp" for the message.

Note that this isn't specific to cri-o - docker --log-driver=json-file logs 
have the same problem of having to first parse the log to determine if it is 
multiline or not.


Thanks,
--noriko

Note that a LOT of log processing tools assume a log message is a single line, 
so you probably want to have newlines escaped in the message before sending it 
to other tools for processing.

David Lang


On Wed, 5 Sep 2018, Noriko Hosoi via rsyslog wrote:

Date: Wed, 5 Sep 2018 14:04:42 -0700
From: Noriko Hosoi via rsyslog <rsyslog@lists.adiscon.com>
To: rsyslog@lists.adiscon.com
Cc: Noriko Hosoi <nho...@redhat.com>
Subject: [rsyslog] Question on multiline log messages

Hello, Rsyslog List,

We have a requirement to process multiline log messages in a log file.

The simplified log format looks like this.

  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

In this example, the first "/.* stdout P .*/" tells it's the beginning of the multiline message.  The next intermediate line also matches "/.* stdout P .*/", which may not exist or could repeat until it hits "/.* stdout F .*/".  Please note that 'P' stands for Partial; 'F' for Full.

Since the messages are logged in a file, we'd like to use the imfile plugin to 
read logs from the file and merge the multiline messages into one line as 
follows.

  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

Additionally, the split into the multiline message could occur periodically and 
there are multiple log files to be processed simultaneously.

If you are curious, it is the log format from the cri-o container.

Do you happen to have an experience to configure rsyslog to fulfill such 
requirements?  If you have and could share them with us, we'd greatly 
appreciate it.

Thanks,
--noriko

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to