I'd love to see a PR for this. I know there are others in the community looking for something similar.
On Sun, Aug 26, 2018 at 7:28 PM <jskar...@gmail.com> wrote: > Hello, > > > > We have implemented a general purpose regex parser for Metron that we are > interested in contributing back to the community. > > > > While the Metron Grok parser provides some regex based capability today, > the intention of this general purpose regex parser is to: > > 1. Allow for more advanced parsing scenarios (specifically, dealing with > multiple regex lines for devices that contain several log formats within > them) > 2. Give users and developers of Metron additional options for parsing > 3. With the new parser chaining and regex routing feature available in > Metron, this gives some additional flexibility to logically separate a > flow > by: > 1. Regex routing to segregate logs at a device level and handle > envelope unwrapping > 2. This general purpose regex parser to parse an entire device type > that contains multiple log formats within the single device (for > example, > RHEL logs) > > > > At a high level control flow is like this: > > 1. Identify the record type if incoming raw message. > > 2. Find and apply the regular expression of corresponding record type to > extract the fields (using named groups). > > 3. Apply the message header regex to extract the fields in the header part > of the message (using named groups). > > > The parser config uses the following structure: > > "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))" > > "messageHeaderRegex": "(?<syslogpriority>(?<=^<) > > \\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s)) > ", > > "fields": [ > > { > > "recordType": "kernel", > > "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))" > > }, > > { > > "recordType": "syslog", > > "regex": > > ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))" > > } > > ] > > > > Where: > > - recordTypeRegex is used to distinctly identify a record type. It > inputs a valid regular expression and may also have named groups, which > would be extracted into fields. > - messageHeaderRegex is used to specify a regular expression to extract > fields from a message part which is common across all the messages (i.e, > syslog fields, standard headers) > - fields: json list of objects containing recordType and regex. The > expression that is evaluated is based on the output of the > recordTypeRegex > - Note: recordTypeRegex and messageHeaderRegex could be specified as > lists also (as a JSON array), where the list will be evaluated in order > until a matching regular expression is found. > > > > > > If there are no objections to having this type of Parser within Metron, we > will open a JIRA/PR for code review. > > *Jagdeep Singh* >