simple-syslog-5424 uses antlr4 instead of regex because I was unable to
find or develop regex’s to single pass parse structured data.  If you look
around you’ll find that most platform’s support for 5424 does not handle
structured data, and is implemented as regex.  The legacy NiFi syslog
support, which takes it’s regex from Flume was like this for example.  Nifi
now supports structured data because it too uses simple-syslog-5424 for
that.  Also that lib offers interfaces and base functionality to build new
parser logic on top of the grammar, on top of the default implementation.

The regex performance, if the regex’s are cached or static should be ok I
think.

Note that I plan to develop simple-syslog-3164, probably using regex with
injectable “message” parsing soon ( and a follow on to create a 3rd,
unified simple-syslog lib ). This will have common headers etc to the 5424
lib.  This will be done in the https://github.com/palindromicity org.


On November 1, 2018 at 01:12:53, Muhammed Irshad (irshadkt....@gmail.com)
wrote:

I have to parse large volumes of syslog data collected in splunk in
different indexes. Seems splunk can be configured in different ways to
collect syslog data
<https://docs.splunk.com/Documentation/Splunk/7.2.0/Data/HowSplunkEnterprisehandlessyslogdata>.
I have a custom written regex parser. I am planning to use regex ( Single
pass ) to separate out message and header and use parser chaining to parse
message content using csv/ regex itself according to the message format. In
terms of performance considering heavy traffic ( 3 TB/day )  any problem
with this approach ? I could see existing syslog5424
<https://github.com/palindromicity/simple-syslog-5424/> uses antlr4 instead
of regex. Any advantage for this in terms of performance ?

--
Muhammed Irshad K T
Senior Software Engineer
+919447946359
irshadkt....@gmail.com
Skype : muhammed.irshad.k.t

Reply via email to