I'd love to see a PR for this.  I know there are others in the community
looking for something similar.

On Sun, Aug 26, 2018 at 7:28 PM <jskar...@gmail.com> wrote:

> Hello,
>
>
>
> We have implemented a general purpose regex parser for Metron that we are
> interested in contributing back to the community.
>
>
>
> While the Metron Grok parser provides some regex based capability today,
> the intention of this general purpose regex parser is to:
>
>    1. Allow for more advanced parsing scenarios (specifically, dealing with
>    multiple regex lines for devices that contain several log formats within
>    them)
>    2. Give users and developers of Metron additional options for parsing
>    3. With the new parser chaining and regex routing feature available in
>    Metron, this gives some additional flexibility to logically separate a
> flow
>    by:
>       1. Regex routing to segregate logs at a device level and handle
>       envelope unwrapping
>       2. This general purpose regex parser to parse an entire device type
>       that contains multiple log formats within the single device (for
> example,
>       RHEL logs)
>
>
>
>  At  a high level control flow is like this:
>
> 1. Identify the record type if incoming raw message.
>
> 2. Find and apply the regular expression of corresponding record type to
> extract the fields (using named groups).
>
> 3. Apply the message header regex to extract the fields in the header part
> of the message (using named groups).
>
>
> The parser config uses the following structure:
>
>    "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
>
>    "messageHeaderRegex": "(?<syslogpriority>(?<=^<)
>
> \\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))
> ",
>
>    "fields": [
>
>       {
>
>         "recordType": "kernel",
>
>         "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
>
>       },
>
>       {
>
>         "recordType": "syslog",
>
>         "regex":
>
> ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
>
>       }
>
> ]
>
>
>
> Where:
>
>    - recordTypeRegex is used to distinctly identify a record type. It
>    inputs a valid regular expression and may also have named groups, which
>    would be extracted into fields.
>    - messageHeaderRegex is used to specify a regular expression to extract
>    fields from a message part which is common across all the messages (i.e,
>    syslog fields, standard headers)
>    - fields: json list of objects containing recordType and regex. The
>    expression that is evaluated is based on the output of the
> recordTypeRegex
>    - Note: recordTypeRegex and messageHeaderRegex could be specified as
>    lists also (as a JSON array), where the list will be evaluated in order
>    until a matching regular expression is found.
>
>
>
>
>
> If there are no objections to having this type of Parser within Metron, we
> will open a JIRA/PR for code review.
>
> *Jagdeep Singh*
>

Reply via email to