On Sat, 5 Dec 2015, Brian Knox wrote:

David - yes, that exactly describes the situation that I'm in. If I can't
find a short term solution with existing capabilities, I may look into
providing a load balanced pool of sanitization workers that I connect to
over the zeromq plugins I've been working on as a more near term solution.
Ideally, I'd like to be able to handle the sanitization within rsyslog
itself.

For a quick hack, a template on my output from my aggregators replacing "."
characters with "_" might work and I'll give that a spin.  I still have an
elasticsearch 1.5 cluster that is our production cluster in parallel with
the new 2.1 cluster, so I have some room to experiment.

any word on why ES made this change in 2.x?

David Lang

As an aside - does anyone have a link to a config example using a regex
replace on a property using the new v8 template format?

Peter - I'd be very interested if you have an approach to this problem that
works with existing syslog capability.

Cheers,
Brian




On Fri, Dec 4, 2015 at 3:28 PM, Peter Portante <[email protected]>
wrote:

On Fri, Dec 4, 2015 at 3:00 PM, David Lang <[email protected]> wrote:

On Fri, 4 Dec 2015, Peter Portante wrote:

On Fri, Dec 4, 2015 at 12:40 PM, Brian Knox <[email protected]>
wrote:

In my case, I have "flat" ( 1 level deep ) CEE JSON logs with field
names
that are dot delimited  (  @cee { "resp.duration_ms" : 10000,
"resp.code" :
200 }  ).


So if you have a "flat" namespace where the fields include dots in them,
then if you move to a hierarchical namespace then won't the field name
references still work?


the problem he's having is the the field names in his incoming logs are
not hierarchical. He's not hand-crafting the structure the way you are,
he's parsing incoming logs and then outputting $! to ES (or something
similar)

As such, he's pretty much stuck with the names on the incoming data.


We are using rsyslog to normalize the data.  I'll post an example config
file for what we are doing shortly (prolly on github).

-peter



Rsyslog hasn't had a requirement before now to change/sanitize the field
names, so there's nothing setup to do this.

the work-around that I can think of basically involved re-parsing the
message after manipulating it.

you could use omexternal to pass the json data to an external script that
can muck with the names and pass them back. unfortunantly this interface
can't delete fields, just alter or add them, so you would want to do
something along the lines of moving everything down a level so instead of
$!blah you have $!fixed!blah (or in json instead of { 'blah': 'value',
'foo': 'value' } you would have { "fixed": { "blah": "value", "foo":
"value" } }

another possibility would be to do something in rsyslog where you use a
template to replace all '.' with some other character, and then parse the
result with mmnormalize, but this is ugly as well.

We've got a few cases where field names just don't work (case sensitivity
, () in field names, etc), so it may be a good idea for someone to write
a
mm (message modification) module that goes through all the field names
and
sanitizes them, with several options as to what to do (and especially
what
to do if the sanitized version already exists, overwrite, try a different
name, ??)

David Lang



GIven my lack of control over the incoming logs, I think the simplest
solution to this issue would be a way to change the attribute names
themselves  ( "resp_duration_ms", "resp_code" ).
Given that I don't know the total space of all possible keys, I'd like
this
to work with the $!all-json property.

If there's not already a way to do this that I'm missing, I think given
the
change in elasticsearch and that the suggested solution to this problem
is
"use logstash", I'd like to look at the possibility of adding a
property
formatter that could handle this sanitization.


On Fri, Dec 4, 2015 at 11:37 AM, Peter Portante <
[email protected]>
wrote:

We are using sub-objects:

# this is for index names to be like: logstash-YYYY.MM.DD
# WARNING: any rsyslog collecting host MUST be running UTC
#          if the proper index is to be chosen to hold the
#          log entry. If you are running EDT, e.g., then
#          the previous day's index will be chosen even
#          though the UTC value is the current day, because
#          the pattern logic does not convert "timereported"
#          to a UTC value before pulling data out of it.
template(name="logstash-index-pattern" type="list") {
    constant(value="logstash-")
    property(name="timereported" dateFormat="rfc3339"
position.from="1" position.to="4")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339"
position.from="6" position.to="7")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339"
position.from="9" position.to="10")
    }
# this is for formatting our syslog data in JSON with @timestamp using
a "hierarchical" metdata namespace
template(name="com-redhat-rsyslog-hier"
         type="list") {
    constant(value="{")
    constant(value="\"@timestamp\":\"")
property(name="timereported" dateFormat="rfc3339")
    constant(value="\",\"@version\":\"2015.09.24-0")
    constant(value="\",\"message\":\"")
property(name="$.msg" format="json")
    constant(value="\",\"hostname\":\"")
property(name="$.hostname")
    constant(value="\",\"level\":\"")
 property(name="$.level")
    constant(value="\",\"pid\":\"")
 property(name="$.pid")
    constant(value="\",\"tags\":\"")
property(name="$.tags")
    constant(value="\",\"CEE\":")
property(name="$!all-json")
    constant(value=",\"systemd\":")
 property(name="$.systemd")
    constant(value=",\"rsyslog\":")
 property(name="$.rsyslog")
    constant(value="}\n")
    }



On Fri, Dec 4, 2015 at 10:44 AM, Brian Knox <[email protected]>
wrote:

I found out today that elasticsearch 2.x does not allow field names to

have

the period character in them.  This is making my life interesting as
I

use

elasticsearch with rsyslog end to end (no logstash), and a lot of our

field

names have "." as a delimiter in them.

In a perfect world, I'd like an "elasticsearch" property formatter
that
could look for and replace "." in field names with "_", that would
also
work with the all-json property, something like:

property(name="$!all-json" format="elasticsearch")

Or, if this is to ES specific for rsyslog core, perhaps we could add

this

functionality to the omelasticsearch output itself (I'll look over the

code

today).

I'd like to not have to introduce logstash to my environment just to

regex

a character in field names.  I'm open to other ideas as well, just

wanted

to start the conversation.

Cheers,
BRian
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a

myriad

of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to