On 09/06/2018 12:49 PM, Rich Megginson wrote:
On 09/06/2018 11:30 AM, David Lang wrote:
On Thu, 6 Sep 2018, Rich Megginson wrote:

On 09/06/2018 12:00 PM, Noriko Hosoi via rsyslog wrote:
Thank you for your response, David.

On 09/06/2018 06:18 AM, David Lang wrote:
There is an option for imfile to split a file into multiline messages based on 
a regex, that is probably the right starting point.
Yes, we also expected startmsg.regex could be used for our purpose.
https://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html#startmsg-regex

In our case, if we set "/.* stdout P .*/" to startmsg.regex. It also matches the next partial log "<TIMESTAMP_n+1> stdout P partial_log_1" and is considered the first part of the another multiline message.  (Please correct me if I'm wrong...)
  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

I thought if there were an optional option, e.g., "endmsg.regex" which 
specifies the end of the multiline, we may be able to solve the problem.  But again I 
could be wrong.

Another issue is when merging the multiline messages into one, we want to so 
just with the message part.  Using this example:
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
We want to assemble
  <TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
instead of
  <TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1 
<TIMESTAMP_n+2> stdout F rest_of_partial_log

Do you think it's doable in imfile?

Even with endmsg.regex support, what we want is to concatenate the _message_ 
fields together, not the _entire lines_.  That is, if the input is this:

  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log

We want the final output to be

  <TIMESTAMP_n> stdout F partial_log_0partial_log_1rest_of_partial_log

Not

  <TIMESTAMP_n> stdout P partial_log_0 <TIMESTAMP_n+1> stdout P partial_log_1 
<TIMESTAMP_n+2> stdout F rest_of_partial_log

you would have to end up doing this by parsing the logs via mmnormalize (probably using the 'repeat' option), and then using the 'foreach' capability to combine the different parts into one message.

Ok, then this might just work.  Thanks!


Another problem is that if imfile is spewing out messages with partial lines in them, there is no guarantee that a subsequent filter is going to see the records in order, due to the multi-threaded nature of rsyslog - so we need a way to uniquely identify and absolutely order each message.  Fortunately, we have 3 message fields to use - $!metadata!filename and $!metadata!offset uniquely identify the message within each file, and we can use the TIMESTAMP if we need to order messages in files that may have been rotated. Then we'll need to implement something like https://github.com/fluent-plugins-nursery/fluent-plugin-concat#usage "Handle containerd/cri in Kubernetes" in rsyslog.

imfile is single-threaded, and so if you use the regex option to define what one message is, the rest of rsyslog will see that one message (so clean it up before you forward it to avoid the headaches that you can run into with forwarding)

Ok, good to know.


Alternately, we could handle this in imfile if we extended imfile to have 
support for pluggable parsers.  The plugin would provide an entrypoint e.g. a 
function parse(char *line)

lucky for you, I believe that you already have the option of putting a parser 
on imfile. It can be a mmnormalize config, or you can define an external parser

Also good to know.

I don't see anything about a parser at 
https://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html or in 
the imfile.c source code.



- imfile reads a line, calls parse(line)
- parser parses the line (which means it will need to call mmnormalize/mmjsonparse . . .), checks for multiline conditions - if multiline, parser returns with some value to imfile indicating 'I need another line' - parser reads all of the multiple lines, returns the complete message to imfile - that is, it concatenates the "log" fields - not sure what it should do with the "time" field, perhaps use only the first or last or ??? presumably the "stream" fields all have the same value.  It could stick the "time" fields in a local variable e.g. $.imfile!firsttime and $.imfile!lasttime which are the "time" fields read from the first and last lines in the message, respectively.  I'm assuming the user will either want to associate the first time or the last time with the message as the actual "@timestamp" for the message.

this process won't work because the first part isn't:

"imfile reads a line, calls parse(line)"

it's

imfile reads to the end of a message, calls parse message

and there's no chance to then combine messages.

so the first thing is to read the entire message in via the regex option, then 
parse the message and reformat it.

you may want to also look at the parser module functionality, this lets you use 
C code for your parser.

Note that this isn't specific to cri-o - docker --log-driver=json-file logs 
have the same problem of having to first parse the log to determine if it is 
multiline or not.

you should not be parsing the line to figure out if it's multiline. multiline 
messages have different stuff at the start than other lines, you key off of 
that.

That is, if multiline specifically means records like this:

first line of recordA
   recordA second line
   recordA third line
....
first line of recordB

Perhaps Noriko and I mean something slightly different by "multiline" - a single message spread out in some way across multiple lines in a log file, not distinguished by the characters at the start of the line.

There's already the aforementioned case of crio log file lines which are distinguished by 
the F or the P in the "partial" field. docker json-file log files have a 
similar problem:

{"time":"<TIMESTAMP_n>", "stream":"stdout", "log":"partial_log_0"}
{"time":"<TIMESTAMP_n+1>","stream":"stdout", "log":"partial_log_1"}
{"time":"<TIMESTAMP_n+2>", "stream":"stdout", "log":"rest_of_partial_log\n"}

In this case, the partial message is indicated by the absence of the trailing \n in the "log" string value, and the final part of the message has the trailing \n.  I'm assuming we could come up with an endmsg.regex value which would work in this case.  Then we would get a single message string like this:

{"time":"<TIMESTAMP_n>", "stream":"stdout", "log":"partial_log_0"}{"time":"<TIMESTAMP_n+1>","stream":"stdout", "log":"partial_log_1"}{"time":"<TIMESTAMP_n+2>", "stream":"stdout", "log":"rest_of_partial_log\n"}

Would we be able to use mmnormalize with repeat and foreach to parse this into

{"time":"<TIMESTAMP_n>", "stream":"stdout", 
"log":"partial_log_0partial_log_1rest_of_partial_log\n"}

?


David Lang


Thanks,
--noriko

Note that a LOT of log processing tools assume a log message is a single line, 
so you probably want to have newlines escaped in the message before sending it 
to other tools for processing.

David Lang


On Wed, 5 Sep 2018, Noriko Hosoi via rsyslog wrote:

Date: Wed, 5 Sep 2018 14:04:42 -0700
From: Noriko Hosoi via rsyslog <rsyslog@lists.adiscon.com>
To: rsyslog@lists.adiscon.com
Cc: Noriko Hosoi <nho...@redhat.com>
Subject: [rsyslog] Question on multiline log messages

Hello, Rsyslog List,

We have a requirement to process multiline log messages in a log file.

The simplified log format looks like this.

  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n> stdout P partial_log_0
  <TIMESTAMP_n+1> stdout P partial_log_1
  <TIMESTAMP_n+2> stdout F rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

In this example, the first "/.* stdout P .*/" tells it's the beginning of the multiline message.  The next intermediate line also matches "/.* stdout P .*/", which may not exist or could repeat until it hits "/.* stdout F .*/".  Please note that 'P' stands for Partial; 'F' for Full.

Since the messages are logged in a file, we'd like to use the imfile plugin to 
read logs from the file and merge the multiline messages into one line as 
follows.

  <TIMESTAMP_0> stdout F full_log_0
    ........................
  <TIMESTAMP_n-1> stdout F full_log_n-1
  <TIMESTAMP_n+2> stdout F partial_log_0 partial_log_1 rest_of_partial_log
  <TIMESTAMP_n+3> stdout F full_log_n+3

Additionally, the split into the multiline message could occur periodically and 
there are multiple log files to be processed simultaneously.

If you are curious, it is the log format from the cri-o container.

Do you happen to have an experience to configure rsyslog to fulfill such 
requirements?  If you have and could share them with us, we'd greatly 
appreciate it.

Thanks,
--noriko

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.





_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to