Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-26 Thread David Lang

On Mon, 26 Feb 2018, deoren wrote:


On 2/25/2018 5:37 PM, David Lang wrote:

On Fri, 23 Feb 2018, deoren wrote:

liblognorm is so fast you really have to use it to believe it. At 
$lastjob I had a 1400 line ruleset handling >100K logs/sec without the 
liblognorm effort being noticable


Wow, that's pretty impressive. I may try employing mmnormalize in both 
locations to see which is easier to work with. I suspect that for some 
cases it would need to be run on the receiver to handle non-rsyslog 
clients (misc equipment for example).


my first setup had the relays doing cleanup (making sure that everything is 
in a standard syslog format, fixing any bad senders, but not parsing 
messages) and then running all the parsing on the central system.


This has the advantage that all the parsing is done in one place, easy to 
tweak and restart.


At my new job, I may push this out to the relays, as all the configs are 
centrally managed and even for the central box, the process is to check it 
in to git and wait for it to get pushed out. In this situation I may push 
the parsing out to the edge and have everything arrive at the central box 
in JSON


With this setup, how do you apply the changes to the central box after the 
updated config file is pulled or pushed to the machine? Is the rsyslog 
restart scripted?


Yes, when the config changes, rsyslog needs to be restarted

Are you considering switching to this approach because of the workflow around 
git?


Yes, they have very clearly drunk the config management coolaid that everything 
must be put in a central config management and published from there (even if it 
takes over an hour for the change to make it out)


I believe in having config management and version control, but I'm far more 
accepting that the change may happen at the leaf and get picked up by the config 
management system rather than being pushed out

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-26 Thread deoren

On 2/27/2018 12:36 AM, David Lang wrote:

On Mon, 26 Feb 2018, deoren wrote:



you are better using mmnormalize, with your example you would have a 
rule


rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%

this would create $!ip, $!host and $!timestamp (note I did this from 
memory, I may have a subtle bug here)


I finally looped back around to this and tested the provided rule. I'm 
not sure how significant it normally is, but the space between the 
colon and the first % sign seemed to throw off the rule, otherwise 
with that space removed the provided rule worked well.


correct, my mistake. The rule is very literal, it was looking for a 
space before the IP address.


Not a problem, thank you for taking the time to provide it.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-26 Thread David Lang

On Mon, 26 Feb 2018, deoren wrote:



you are better using mmnormalize, with your example you would have a rule

rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%

this would create $!ip, $!host and $!timestamp (note I did this from 
memory, I may have a subtle bug here)


I finally looped back around to this and tested the provided rule. I'm 
not sure how significant it normally is, but the space between the colon 
and the first % sign seemed to throw off the rule, otherwise with that 
space removed the provided rule worked well.


correct, my mistake. The rule is very literal, it was looking for a space before 
the IP address.


David Lang
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-26 Thread deoren

On 2/25/2018 5:37 PM, David Lang wrote:

On Fri, 23 Feb 2018, deoren wrote:

liblognorm is so fast you really have to use it to believe it. At 
$lastjob I had a 1400 line ruleset handling >100K logs/sec without 
the liblognorm effort being noticable


Wow, that's pretty impressive. I may try employing mmnormalize in both 
locations to see which is easier to work with. I suspect that for some 
cases it would need to be run on the receiver to handle non-rsyslog 
clients (misc equipment for example).


my first setup had the relays doing cleanup (making sure that everything 
is in a standard syslog format, fixing any bad senders, but not parsing 
messages) and then running all the parsing on the central system.


This has the advantage that all the parsing is done in one place, easy 
to tweak and restart.


At my new job, I may push this out to the relays, as all the configs are 
centrally managed and even for the central box, the process is to check 
it in to git and wait for it to get pushed out. In this situation I may 
push the parsing out to the edge and have everything arrive at the 
central box in JSON


With this setup, how do you apply the changes to the central box after 
the updated config file is pulled or pushed to the machine? Is the 
rsyslog restart scripted?


Are you considering switching to this approach because of the workflow 
around git?

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-26 Thread deoren

On 2/20/2018 6:58 PM, David Lang wrote:

On Tue, 20 Feb 2018, deoren wrote:


On 2/20/2018 6:39 PM, deoren wrote:
I've been attempting to use the re_extract() function quite a bit 
lately to write some simple "filters" for notification purposes. I 
struggled with the syntax for a while until I realized tha the  and 
have been struggling quite a bit with the regex support for the 
re_extract() function. According to the http://www.rsyslog.com/regex/ 
page (and the re_extract function doc), Rsyslog uses POSIX ERE and 
"optionally" BRE expressions.


* Does anyone have a good guide or reference for the syntax needed?
* How do you switch the regex type from ERE to BRE? At at glance it 
appears that the BRE format is more cumbersome, so I want to make 
sure that I don't unintentionally switch that mode on somehow.


I found the differences between the two briefly described on this page:


https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions 




Does Rsyslog have complete support for ERE expressions? In other 
words, if I find a guide which covers ERE thoroughly, is that 
sufficient or are there gaps in rsyslog's support for the ERE syntax 
that I should be aware of?


Thanks.


Addendum to my earlier questions (which are still valid and "open" for 
feedback):


Real world example of what I'm working with (single line, likely 
wrapped by my mail client):


123.123.123.123 - abc1234 [20/Feb/2018:10:36:01 -0600] "GET 
http://example.org:80/servlet/SPECIFIC_PATTERN_HERE HTTP/1.1" 200 2182 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/39.0.2171.95 Safari/537.36"


here is a PCRE regex that _seems_ to do what I want:

^([0-9]+.[0-9]+.[0-9]+.[0-9]+)\s\-\s([A-Za-z0-9]+)\s\[([0-9A-Za-z:\/\s-]+)\] 



and provides me with three match group results I can reference:

1. 123.123.123.123
2. abc1234
3. 20/Feb/2018:10:36:01 -0600

As I understand it, re_extract allows retrieving only a specific match 
at a time, so I grab the two I care about like so and save to local 
variables (this processing is done on the primary receiver):


set $.remote-ip = re_extract(
    $msg,
    "^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
    0, 1,
    'unknown remote ip');

set $.remote-user = re_extract(
    $msg,
    "^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
    0, 2,
    'unknown remote user');


This seems to work and only required escaping the escape character 
(how "meta").


I've read that mmnormalize is recommended over regexes for performance 
reasons, but I have little experience with liblognorm (other than 
knowing it exists). Am I better off writing a few regex matches like 
I'm doing above or crafting (and testing) liblognorm rulesets, using 
them with mmnormalize to generate a JSON structure and then pulling 
what I want from a JSON structure?


you are better using mmnormalize, with your example you would have a rule

rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%

this would create $!ip, $!host and $!timestamp (note I did this from 
memory, I may have a subtle bug here)


I finally looped back around to this and tested the provided rule. I'm 
not sure how significant it normally is, but the space between the colon 
and the first % sign seemed to throw off the rule, otherwise with that 
space removed the provided rule worked well.


Thanks again for your help.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-25 Thread David Lang

On Fri, 23 Feb 2018, deoren wrote:

I was wondering where else other than rsyslog that liblognorm was used. 
Sounds like this is a case where others perhaps have just not heard about it.


There is at least one other project that uses it. But I don't remember what it's 
name is.


___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-25 Thread David Lang

On Fri, 23 Feb 2018, deoren wrote:

liblognorm is so fast you really have to use it to believe it. At $lastjob 
I had a 1400 line ruleset handling >100K logs/sec without the liblognorm 
effort being noticable


Wow, that's pretty impressive. I may try employing mmnormalize in both 
locations to see which is easier to work with. I suspect that for some cases 
it would need to be run on the receiver to handle non-rsyslog clients (misc 
equipment for example).


my first setup had the relays doing cleanup (making sure that everything is in a 
standard syslog format, fixing any bad senders, but not parsing messages) and 
then running all the parsing on the central system.


This has the advantage that all the parsing is done in one place, easy to tweak 
and restart.


At my new job, I may push this out to the relays, as all the configs are 
centrally managed and even for the central box, the process is to check it in to 
git and wait for it to get pushed out. In this situation I may push the parsing 
out to the edge and have everything arrive at the central box in JSON

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-23 Thread deoren

On 2/21/2018 11:23 PM, matthew.gaetano wrote:

Liblognorm is love, Liblognorm is life


To Echo Dave, $currentjob uses REK to provided services to various $client
at anywhere from 60-80k mps in realtime, plus spikes upwards of over 100k
mps. For redundancy (load balancing - waste not want not) we use two nodes
for the rsyslog but one node could easily handle the whole load. I cant get
that kind of vertical scalability with other tools; doing so often ends up
having farms of dozens of instances running (if not because of the load
itself, then because of the sequential regex parsing).


That is good info, thank you. I've heard others speak of how well 
rsyslog handles heavy loads where other related tools tend to struggle 
(or require significantly more resources). This is very encouraging.




It can take some getting use to writing Liblognorm rules but once you've
written a few you get the hang of it quickly; its not anymore effort than
learning Grok (Greylog, Logstash, regex fanatic product x). While it cant
always do everything more traditional regex based applications can, more so
highlighted as the more complicated you get, its worth the time effort
learning and using Liblognorm. Not only in rsyslog but in other applications
as well (I am currently using it NiFi).


I was wondering where else other than rsyslog that liblognorm was used. 
Sounds like this is a case where others perhaps have just not heard 
about it.



Liblognorm needs more users and more attention, I implore you to use it :)


Thanks for your feedback on this. Based on all of the recommendations 
and good points shared, I will definitely invest the time to learn how 
to use it.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-23 Thread deoren

On 2/21/2018 7:02 PM, David Lang wrote:

On Wed, 21 Feb 2018, deoren wrote:


On 2/20/2018 6:58 PM, David Lang wrote:

On 2/20/2018 6:39 PM, deoren wrote:

In this case, my specific goal is to look for log messages 
containing "SPECIFIC_PATTERN_HERE" (as shown in sample log message) 
and if a match is found parse the message to pull out specific 
values. Those values are then used to generate a notification for 
our ticketing system (e.g., specific URL patterns indicate abuse 
that we need to review further before our vendor contacts us and 
threatens to cut off service). In this case we're not matching a 
possible range of patterns, but a very specific string that is known 
to us.


you don't need to do this two stage approach (detect a pattern, then 
parse the log) with liblognorm. Instead you just create rules for all 
your logs that include the various patters that you want to match, 
and liblognorm uses whichever one matches. The two-stage approach is 
needed with regexes because they are so expensive to to evaluate, but 
since liblognorm rules are so fast, it makes far more sense to just 
define all the rules.


Do you recommend running mmnormalize as close to the source as 
possible or on the primary receiver? I'm guessing the former so that 
the rules are run on the original source and not on content that may 
have been modified by other receivers in transit?


there are arguments both ways.

running it close to the source distributes the work (but if you run it 
on the machine that has the source, it is some extra load)


but the resulting json is typically a bit larger than the original 
message (not always, but typically) and so it can take more network 
bandwidth to send the result.


liblognorm is so fast you really have to use it to believe it. At 
$lastjob I had a 1400 line ruleset handling >100K logs/sec without the 
liblognorm effort being noticable


Wow, that's pretty impressive. I may try employing mmnormalize in both 
locations to see which is easier to work with. I suspect that for some 
cases it would need to be run on the receiver to handle non-rsyslog 
clients (misc equipment for example).




Is mmnormalize primarily intended for content ingested via imfile or 
is it pretty standard to apply mmnormalize to all inputs? Perhaps just 
the inputs where you expected unstructured log content to be ingested?


it is very much NOT limited to imfile, it's the general purpose tool to 
convert unstructured log content to a normalized format.


Thanks for confirming. I've seen the two paired up in some guides I've 
looked over, so I began to wonder if that was the common scenario.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-21 Thread matthew.gaetano
Liblognorm is love, Liblognorm is life


To Echo Dave, $currentjob uses REK to provided services to various $client
at anywhere from 60-80k mps in realtime, plus spikes upwards of over 100k
mps. For redundancy (load balancing - waste not want not) we use two nodes
for the rsyslog but one node could easily handle the whole load. I cant get
that kind of vertical scalability with other tools; doing so often ends up
having farms of dozens of instances running (if not because of the load
itself, then because of the sequential regex parsing).

It can take some getting use to writing Liblognorm rules but once you've
written a few you get the hang of it quickly; its not anymore effort than
learning Grok (Greylog, Logstash, regex fanatic product x). While it cant
always do everything more traditional regex based applications can, more so
highlighted as the more complicated you get, its worth the time effort
learning and using Liblognorm. Not only in rsyslog but in other applications
as well (I am currently using it NiFi).

Liblognorm needs more users and more attention, I implore you to use it :)



-
~Regards

Matthew Gaetano
--
Sent from: http://rsyslog-users.1305293.n2.nabble.com/
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-21 Thread David Lang

On Wed, 21 Feb 2018, deoren wrote:


On 2/20/2018 6:58 PM, David Lang wrote:

On 2/20/2018 6:39 PM, deoren wrote:

In this case, my specific goal is to look for log messages containing 
"SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match is 
found parse the message to pull out specific values. Those values are then 
used to generate a notification for our ticketing system (e.g., specific 
URL patterns indicate abuse that we need to review further before our 
vendor contacts us and threatens to cut off service). In this case we're 
not matching a possible range of patterns, but a very specific string that 
is known to us.


you don't need to do this two stage approach (detect a pattern, then parse 
the log) with liblognorm. Instead you just create rules for all your logs 
that include the various patters that you want to match, and liblognorm 
uses whichever one matches. The two-stage approach is needed with regexes 
because they are so expensive to to evaluate, but since liblognorm rules 
are so fast, it makes far more sense to just define all the rules.


Do you recommend running mmnormalize as close to the source as possible or on 
the primary receiver? I'm guessing the former so that the rules are run on 
the original source and not on content that may have been modified by other 
receivers in transit?


there are arguments both ways.

running it close to the source distributes the work (but if you run it on the 
machine that has the source, it is some extra load)


but the resulting json is typically a bit larger than the original message (not 
always, but typically) and so it can take more network bandwidth to send the 
result.


liblognorm is so fast you really have to use it to believe it. At $lastjob I had 
a 1400 line ruleset handling >100K logs/sec without the liblognorm effort being 
noticable


Is mmnormalize primarily intended for content ingested via imfile or is it 
pretty standard to apply mmnormalize to all inputs? Perhaps just the inputs 
where you expected unstructured log content to be ingested?


it is very much NOT limited to imfile, it's the general purpose tool to convert 
unstructured log content to a normalized format.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-21 Thread deoren

On 2/20/2018 10:28 PM, Andrew Griffin via rsyslog wrote:

I’ll second David and say that mmnormalize is your better option.  Though 
whenever I get in a discussion about troubleshooting regex I always make a 
point to recommend the Regex Rx app (if you’re a Mac user):

https://itunes.apple.com/us/app/regexrx/id498370702?mt=12 


It’s $5, and if you’re writing 1 regex a year or 20 a day, it’s worth every 
single penny.  I’ve been using it for years and it has saved me hundreds of 
hours of troubleshooting and fiddling with regexes

Andrew


Thanks for the feedback.

Unfortunately I don't own any Apple products, but I will keep it in mind 
if I obtain one in the future (it is possible I'll need to support them 
at work at some point, so I may be assigned such a device then).


On a loosely related note, are there any repos of liblognorm rulebases 
that you reference?


I found this one:

https://github.com/rsyslog/liblognorm-rulebases

which in turn linked to these:

* 
https://github.com/pschiffe/rsyslog-elasticsearch-kibana/tree/master/rsyslog


* https://github.com/beave/sagan-rules

Looking over those samples and the doc should give me plenty to review, 
but I figured I'd ask since I was already responding to your email.


Thanks again for the feedback.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-21 Thread deoren

On 2/20/2018 6:58 PM, David Lang wrote:

On 2/20/2018 6:39 PM, deoren wrote:

>>
I've read that mmnormalize is recommended over regexes for performance 
reasons, but I have little experience with liblognorm (other than 
knowing it exists). Am I better off writing a few regex matches like 
I'm doing above or crafting (and testing) liblognorm rulesets, using 
them with mmnormalize to generate a JSON structure and then pulling 
what I want from a JSON structure?


you are better using mmnormalize, with your example you would have a rule

rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%

this would create $!ip, $!host and $!timestamp (note I did this from 
memory, I may have a subtle bug here)


Thanks! I'll give that a try.

In this case, my specific goal is to look for log messages containing 
"SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a 
match is found parse the message to pull out specific values. Those 
values are then used to generate a notification for our ticketing 
system (e.g., specific URL patterns indicate abuse that we need to 
review further before our vendor contacts us and threatens to cut off 
service). In this case we're not matching a possible range of 
patterns, but a very specific string that is known to us.


you don't need to do this two stage approach (detect a pattern, then 
parse the log) with liblognorm. Instead you just create rules for all 
your logs that include the various patters that you want to match, and 
liblognorm uses whichever one matches. The two-stage approach is needed 
with regexes because they are so expensive to to evaluate, but since 
liblognorm rules are so fast, it makes far more sense to just define all 
the rules.


Do you recommend running mmnormalize as close to the source as possible 
or on the primary receiver? I'm guessing the former so that the rules 
are run on the original source and not on content that may have been 
modified by other receivers in transit?


Is mmnormalize primarily intended for content ingested via imfile or is 
it pretty standard to apply mmnormalize to all inputs? Perhaps just the 
inputs where you expected unstructured log content to be ingested?


I know there are dedicated tools for pattern matching and reporting 
(Graylog is something I'm kicking the tires on and I've heard that 
Riemann is designed for tasks like this), but I was hoping to get some 
basic monitoring in place now with a tool that I'm halfway familiar 
with before attempting to implement other tools for easier management 
of more complex patterns. I've already implemented 4-5 other 
notifications and it's worked well thus far, but I wanted to get input 
from the community to see if I'm going about this the wrong way (first 
using regexes over mmnormalize, then as a secondary issue using 
rsyslog for notifications vs Graylog or Riemann).


in any case, you want to use mmnormalize/liblognorm to parse the files 
into json, and then feed the json into your alerting engine.


Thanks for your help.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-21 Thread deoren

On 2/20/2018 6:50 PM, David Lang wrote:
you really should look at using mmnormalize to extract fields from the 
logs, it's FAR faster.


Will do. I was looking over the liblognorm doc last night and it makes a 
little sense. The v2 options look to have expanded the support quite a 
bit, at the cost of some complexity. I get the impression that I'll just 
have to play with it a bit for things to make sense.


Rainer may be able to answer off the top of his head, otherwise we would 
have to dig into the code.


Understood.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-20 Thread Andrew Griffin via rsyslog
I’ll second David and say that mmnormalize is your better option.  Though 
whenever I get in a discussion about troubleshooting regex I always make a 
point to recommend the Regex Rx app (if you’re a Mac user): 

https://itunes.apple.com/us/app/regexrx/id498370702?mt=12 


It’s $5, and if you’re writing 1 regex a year or 20 a day, it’s worth every 
single penny.  I’ve been using it for years and it has saved me hundreds of 
hours of troubleshooting and fiddling with regexes

Andrew

> On Feb 20, 2018, at 4:58 PM, David Lang  wrote:
> 
> On Tue, 20 Feb 2018, deoren wrote:
> 
>> On 2/20/2018 6:39 PM, deoren wrote:
>>> I've been attempting to use the re_extract() function quite a bit lately to 
>>> write some simple "filters" for notification purposes. I struggled with the 
>>> syntax for a while until I realized tha the  and have been struggling quite 
>>> a bit with the regex support for the re_extract() function. According to 
>>> the http://www.rsyslog.com/regex/ page (and the re_extract function doc), 
>>> Rsyslog uses POSIX ERE and "optionally" BRE expressions.
>>> * Does anyone have a good guide or reference for the syntax needed?
>>> * How do you switch the regex type from ERE to BRE? At at glance it appears 
>>> that the BRE format is more cumbersome, so I want to make sure that I don't 
>>> unintentionally switch that mode on somehow.
>>> I found the differences between the two briefly described on this page:
>> https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
>>  
>>> Does Rsyslog have complete support for ERE expressions? In other words, if 
>>> I find a guide which covers ERE thoroughly, is that sufficient or are there 
>>> gaps in rsyslog's support for the ERE syntax that I should be aware of?
>>> Thanks.
>> 
>> Addendum to my earlier questions (which are still valid and "open" for 
>> feedback):
>> 
>> Real world example of what I'm working with (single line, likely wrapped by 
>> my mail client):
>> 
>> 123.123.123.123 - abc1234 [20/Feb/2018:10:36:01 -0600] "GET 
>> http://example.org:80/servlet/SPECIFIC_PATTERN_HERE HTTP/1.1" 200 2182 
>> "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) 
>> Chrome/39.0.2171.95 Safari/537.36"
>> 
>> here is a PCRE regex that _seems_ to do what I want:
>> 
>> ^([0-9]+.[0-9]+.[0-9]+.[0-9]+)\s\-\s([A-Za-z0-9]+)\s\[([0-9A-Za-z:\/\s-]+)\]
>> 
>> and provides me with three match group results I can reference:
>> 
>> 1. 123.123.123.123
>> 2. abc1234
>> 3. 20/Feb/2018:10:36:01 -0600
>> 
>> As I understand it, re_extract allows retrieving only a specific match at a 
>> time, so I grab the two I care about like so and save to local variables 
>> (this processing is done on the primary receiver):
>> 
>> set $.remote-ip = re_extract(
>>$msg,
>>"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
>>0, 1,
>>'unknown remote ip');
>> 
>> set $.remote-user = re_extract(
>>$msg,
>>"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
>>0, 2,
>>'unknown remote user');
>> 
>> 
>> This seems to work and only required escaping the escape character (how 
>> "meta").
>> 
>> I've read that mmnormalize is recommended over regexes for performance 
>> reasons, but I have little experience with liblognorm (other than knowing it 
>> exists). Am I better off writing a few regex matches like I'm doing above or 
>> crafting (and testing) liblognorm rulesets, using them with mmnormalize to 
>> generate a JSON structure and then pulling what I want from a JSON structure?
> 
> you are better using mmnormalize, with your example you would have a rule
> 
> rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%
> 
> this would create $!ip, $!host and $!timestamp (note I did this from memory, 
> I may have a subtle bug here)
> 
>> In this case, my specific goal is to look for log messages containing 
>> "SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match is 
>> found parse the message to pull out specific values. Those values are then 
>> used to generate a notification for our ticketing system (e.g., specific URL 
>> patterns indicate abuse that we need to review further before our vendor 
>> contacts us and threatens to cut off service). In this case we're not 
>> matching a possible range of patterns, but a very specific string that is 
>> known to us.
> 
> you don't need to do this two stage approach (detect a pattern, then parse 
> the log) with liblognorm. Instead you just create rules for all your logs 
> that include the various patters that you want to match, and liblognorm uses 
> whichever one matches. The two-stage approach is needed with regexes because 
> they are so expensive to to evaluate, but since liblognorm rules are so fast, 
> it makes far more sense to just define all the rules.
> 
>> I know there are dedicated tools for pattern matching and reporting (Graylog 
>> is something I'm 

Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-20 Thread David Lang

On Tue, 20 Feb 2018, deoren wrote:


On 2/20/2018 6:39 PM, deoren wrote:
I've been attempting to use the re_extract() function quite a bit lately 
to write some simple "filters" for notification purposes. I struggled 
with the syntax for a while until I realized tha the  and have been 
struggling quite a bit with the regex support for the re_extract() 
function. According to the http://www.rsyslog.com/regex/ page (and the 
re_extract function doc), Rsyslog uses POSIX ERE and "optionally" BRE 
expressions.


* Does anyone have a good guide or reference for the syntax needed?
* How do you switch the regex type from ERE to BRE? At at glance it 
appears that the BRE format is more cumbersome, so I want to make sure 
that I don't unintentionally switch that mode on somehow.


I found the differences between the two briefly described on this page:


https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions 



Does Rsyslog have complete support for ERE expressions? In other words, 
if I find a guide which covers ERE thoroughly, is that sufficient or are 
there gaps in rsyslog's support for the ERE syntax that I should be 
aware of?


Thanks.


Addendum to my earlier questions (which are still valid and "open" for 
feedback):


Real world example of what I'm working with (single line, likely wrapped 
by my mail client):


123.123.123.123 - abc1234 [20/Feb/2018:10:36:01 -0600] "GET 
http://example.org:80/servlet/SPECIFIC_PATTERN_HERE HTTP/1.1" 200 2182 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/39.0.2171.95 Safari/537.36"


here is a PCRE regex that _seems_ to do what I want:

^([0-9]+.[0-9]+.[0-9]+.[0-9]+)\s\-\s([A-Za-z0-9]+)\s\[([0-9A-Za-z:\/\s-]+)\]

and provides me with three match group results I can reference:

1. 123.123.123.123
2. abc1234
3. 20/Feb/2018:10:36:01 -0600

As I understand it, re_extract allows retrieving only a specific match 
at a time, so I grab the two I care about like so and save to local 
variables (this processing is done on the primary receiver):


set $.remote-ip = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 1,
'unknown remote ip');

set $.remote-user = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 2,
'unknown remote user');


This seems to work and only required escaping the escape character (how 
"meta").


I've read that mmnormalize is recommended over regexes for performance 
reasons, but I have little experience with liblognorm (other than 
knowing it exists). Am I better off writing a few regex matches like I'm 
doing above or crafting (and testing) liblognorm rulesets, using them 
with mmnormalize to generate a JSON structure and then pulling what I 
want from a JSON structure?


you are better using mmnormalize, with your example you would have a rule

rule=: %ip:ipv4% - %host:word% [%timestamp:char-to:]%]%-:rest%

this would create $!ip, $!host and $!timestamp (note I did this from memory, I 
may have a subtle bug here)


In this case, my specific goal is to look for log messages containing 
"SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match 
is found parse the message to pull out specific values. Those values are 
then used to generate a notification for our ticketing system (e.g., 
specific URL patterns indicate abuse that we need to review further 
before our vendor contacts us and threatens to cut off service). In this 
case we're not matching a possible range of patterns, but a very 
specific string that is known to us.


you don't need to do this two stage approach (detect a pattern, then parse the 
log) with liblognorm. Instead you just create rules for all your logs that 
include the various patters that you want to match, and liblognorm uses 
whichever one matches. The two-stage approach is needed with regexes because 
they are so expensive to to evaluate, but since liblognorm rules are so fast, it 
makes far more sense to just define all the rules.


I know there are dedicated tools for pattern matching and reporting 
(Graylog is something I'm kicking the tires on and I've heard that 
Riemann is designed for tasks like this), but I was hoping to get some 
basic monitoring in place now with a tool that I'm halfway familiar with 
before attempting to implement other tools for easier management of more 
complex patterns. I've already implemented 4-5 other notifications and 
it's worked well thus far, but I wanted to get input from the community 
to see if I'm going about this the wrong way (first using regexes over 
mmnormalize, then as a secondary issue using rsyslog for notifications 
vs Graylog or Riemann).


in any case, you want to use mmnormalize/liblognorm to parse the files into 
json, and then feed the json into your alerting engine.

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rs

Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-20 Thread David Lang
you really should look at using mmnormalize to extract fields from the logs, 
it's FAR faster.


Rainer may be able to answer off the top of his head, otherwise we would have to 
dig into the code.


David Lang

On Tue, 20 Feb 2018, deoren wrote:

I've been attempting to use the re_extract() function quite a bit lately to 
write some simple "filters" for notification purposes. I struggled with the 
syntax for a while until I realized tha the  and have been struggling quite a 
bit with the regex support for the re_extract() function. According to the 
http://www.rsyslog.com/regex/ page (and the re_extract function doc), Rsyslog 
uses POSIX ERE and "optionally" BRE expressions.


* Does anyone have a good guide or reference for the syntax needed?
* How do you switch the regex type from ERE to BRE? At at glance it appears 
that the BRE format is more cumbersome, so I want to make sure that I don't 
unintentionally switch that mode on somehow.


I found the differences between the two briefly described on this page:

https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions

Does Rsyslog have complete support for ERE expressions? In other words, if I 
find a guide which covers ERE thoroughly, is that sufficient or are there 
gaps in rsyslog's support for the ERE syntax that I should be aware of?


Thanks.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
LIKE THAT.



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] Anyone have any good guides for the specific regex format/syntax required for re_extract() ?

2018-02-20 Thread deoren

On 2/20/2018 6:39 PM, deoren wrote:
I've been attempting to use the re_extract() function quite a bit lately 
to write some simple "filters" for notification purposes. I struggled 
with the syntax for a while until I realized tha the  and have been 
struggling quite a bit with the regex support for the re_extract() 
function. According to the http://www.rsyslog.com/regex/ page (and the 
re_extract function doc), Rsyslog uses POSIX ERE and "optionally" BRE 
expressions.


* Does anyone have a good guide or reference for the syntax needed?
* How do you switch the regex type from ERE to BRE? At at glance it 
appears that the BRE format is more cumbersome, so I want to make sure 
that I don't unintentionally switch that mode on somehow.


I found the differences between the two briefly described on this page:

https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions 



Does Rsyslog have complete support for ERE expressions? In other words, 
if I find a guide which covers ERE thoroughly, is that sufficient or are 
there gaps in rsyslog's support for the ERE syntax that I should be 
aware of?


Thanks.


Addendum to my earlier questions (which are still valid and "open" for 
feedback):


Real world example of what I'm working with (single line, likely wrapped 
by my mail client):


123.123.123.123 - abc1234 [20/Feb/2018:10:36:01 -0600] "GET 
http://example.org:80/servlet/SPECIFIC_PATTERN_HERE HTTP/1.1" 200 2182 
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/39.0.2171.95 Safari/537.36"


here is a PCRE regex that _seems_ to do what I want:

^([0-9]+.[0-9]+.[0-9]+.[0-9]+)\s\-\s([A-Za-z0-9]+)\s\[([0-9A-Za-z:\/\s-]+)\]

and provides me with three match group results I can reference:

1. 123.123.123.123
2. abc1234
3. 20/Feb/2018:10:36:01 -0600

As I understand it, re_extract allows retrieving only a specific match 
at a time, so I grab the two I care about like so and save to local 
variables (this processing is done on the primary receiver):


set $.remote-ip = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 1,
'unknown remote ip');

set $.remote-user = re_extract(
$msg,
"^([0-9]+\\.[0-9]+\\.[0-9]+.[0-9]+)\\s.\\s([A-Za-z0-9]+)\\s",
0, 2,
'unknown remote user');


This seems to work and only required escaping the escape character (how 
"meta").


I've read that mmnormalize is recommended over regexes for performance 
reasons, but I have little experience with liblognorm (other than 
knowing it exists). Am I better off writing a few regex matches like I'm 
doing above or crafting (and testing) liblognorm rulesets, using them 
with mmnormalize to generate a JSON structure and then pulling what I 
want from a JSON structure?


In this case, my specific goal is to look for log messages containing 
"SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match 
is found parse the message to pull out specific values. Those values are 
then used to generate a notification for our ticketing system (e.g., 
specific URL patterns indicate abuse that we need to review further 
before our vendor contacts us and threatens to cut off service). In this 
case we're not matching a possible range of patterns, but a very 
specific string that is known to us.


I know there are dedicated tools for pattern matching and reporting 
(Graylog is something I'm kicking the tires on and I've heard that 
Riemann is designed for tasks like this), but I was hoping to get some 
basic monitoring in place now with a tool that I'm halfway familiar with 
before attempting to implement other tools for easier management of more 
complex patterns. I've already implemented 4-5 other notifications and 
it's worked well thus far, but I wanted to get input from the community 
to see if I'm going about this the wrong way (first using regexes over 
mmnormalize, then as a secondary issue using rsyslog for notifications 
vs Graylog or Riemann).

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.