On Sat, Dec 5, 2015 at 5:03 PM, David Lang <[email protected]> wrote:

> we really need mmscrubnames or similar
>
> 1. change all names to lower case
> 2. replace characters that rsyslog doesn't allow in names with something
> 3. allow other characters to be added to the list to be replaced
> 4. change names that are foo!bar into multi-layer structures
> 5. handle the case where these changes create nultiple objects with the
> same name (probably by appending a string until there are no longer
> conflicts)
>
> #1 may be able to go away in a decade or so if we allow case sensitive
> names as an option
>

Don't we need to make this go away sooner than later?  If rsyslog is the
link in the chain that prevents someone from getting the key names they
expect into ES, won't they find something else to replace that link?

I have made available RPMs for EPEL 7 (which should work on RHEL 7 and
CentOS 7)P, and Fedora 21, 22, and 23.  Why not make the effort to find out
what breaks, and put in a switch so that folks can opt-in to case-sensitive
names in config files?  I'd be happy to implement the switch, but would
need help verifying existing configurations work.


>
> #4 is needed because project lumberjack defined names to be able to be
> foo!bar instead of foo { bar } to allow things that don't understand
> multi-dimension structures to still use them (and I need it to handle a
> similar problem from my mmnormalize rulesets). We have talked about having
> this be a post-parser pass in liblognorm, but it really should be available
> no matter where the field names originate (currently mmnormalize,
> mmjsonparse, or mmexternal I believe)
>

imjournal also creates json variables, but I don't think it uses a
hierarchy.


>
> #2 needs to be done on the actual variable names, not just on the ES
> output so that the variables can be accessed and manipulated in rsyslog
>

Why do we need to do this?  Is this because we need to reference them in
the configuration files?  If so, why not provide an escape syntax for the
configuration file?

Do we really want rsyslog in the position where it adds restrictions to the
data handling pipeline because of how it operates?  I think we all agree
that an mmscrubnames module would be good to help put rsyslog in the
position of transforming data from one source to another in the overall
pipeline.


>
> #5 needs to be done or {"Foo":1,"foo":2} will loose data


> and finally, #4 is needed to allow the work-around for problems like ES
> has.
>

I am not sure I follow why this allows us to work-around problems like ES
has.

The dots in field names are confusing and ambiguous in ES because you can
reference a hierarchical set of objects in the json objects indexed.  So if
one has a field name with dots in it in one document and another document
in the index has a hierarchy with sub objects, then it is ambiguous which
we are dealing with, if I understand the problem correctly.

Folks can stay with ES 1.7 if they need the dots in names.


>
> #1,2,3,5 are all extremely similar, once the code is available to do one,
> all the others become trivial. #4 is a bit more complex, but I think it's
> similar enough to do in the same module.
>
> David Lang
>
> On Sat, 5 Dec 2015, Rainer Gerhards wrote:
>
> Date: Sat, 5 Dec 2015 22:42:17 +0100
>> From: Rainer Gerhards <[email protected]>
>> Reply-To: rsyslog-users <[email protected]>
>> To: rsyslog-users <[email protected]>
>> Subject: Re: [rsyslog] elasticsearch 2.0 and field names
>>
>>
>> Can you file a feature request for melasticsesarch on github. I guess a
>> quick but useful back could be done there.
>>
>> Rainer
>> Am 05.12.2015 16:15 schrieb "Brian Knox" <[email protected]>:
>>
>> David - yes, that exactly describes the situation that I'm in. If I can't
>>> find a short term solution with existing capabilities, I may look into
>>> providing a load balanced pool of sanitization workers that I connect to
>>> over the zeromq plugins I've been working on as a more near term
>>> solution.
>>> Ideally, I'd like to be able to handle the sanitization within rsyslog
>>> itself.
>>>
>>> For a quick hack, a template on my output from my aggregators replacing
>>> "."
>>> characters with "_" might work and I'll give that a spin.  I still have
>>> an
>>> elasticsearch 1.5 cluster that is our production cluster in parallel with
>>> the new 2.1 cluster, so I have some room to experiment.
>>>
>>> As an aside - does anyone have a link to a config example using a regex
>>> replace on a property using the new v8 template format?
>>>
>>> Peter - I'd be very interested if you have an approach to this problem
>>> that
>>> works with existing syslog capability.
>>>
>>> Cheers,
>>> Brian
>>>
>>>
>>>
>>>
>>> On Fri, Dec 4, 2015 at 3:28 PM, Peter Portante <
>>> [email protected]
>>>
>>>>
>>>> wrote:
>>>
>>> On Fri, Dec 4, 2015 at 3:00 PM, David Lang <[email protected]> wrote:
>>>>
>>>> On Fri, 4 Dec 2015, Peter Portante wrote:
>>>>>
>>>>> On Fri, Dec 4, 2015 at 12:40 PM, Brian Knox <[email protected]>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> In my case, I have "flat" ( 1 level deep ) CEE JSON logs with field
>>>>>>
>>>>> names
>>>>
>>>>> that are dot delimited  (  @cee { "resp.duration_ms" : 10000,
>>>>>>> "resp.code" :
>>>>>>> 200 }  ).
>>>>>>>
>>>>>>>
>>>>>>> So if you have a "flat" namespace where the fields include dots in
>>>>>>
>>>>> them,
>>>
>>>> then if you move to a hierarchical namespace then won't the field name
>>>>>> references still work?
>>>>>>
>>>>>>
>>>>> the problem he's having is the the field names in his incoming logs are
>>>>> not hierarchical. He's not hand-crafting the structure the way you are,
>>>>> he's parsing incoming logs and then outputting $! to ES (or something
>>>>> similar)
>>>>>
>>>>> As such, he's pretty much stuck with the names on the incoming data.
>>>>>
>>>>>
>>>> We are using rsyslog to normalize the data.  I'll post an example config
>>>> file for what we are doing shortly (prolly on github).
>>>>
>>>> -peter
>>>>
>>>>
>>>>
>>>>> Rsyslog hasn't had a requirement before now to change/sanitize the
>>>>>
>>>> field
>>>
>>>> names, so there's nothing setup to do this.
>>>>>
>>>>> the work-around that I can think of basically involved re-parsing the
>>>>> message after manipulating it.
>>>>>
>>>>> you could use omexternal to pass the json data to an external script
>>>>>
>>>> that
>>>
>>>> can muck with the names and pass them back. unfortunantly this
>>>>>
>>>> interface
>>>
>>>> can't delete fields, just alter or add them, so you would want to do
>>>>> something along the lines of moving everything down a level so instead
>>>>>
>>>> of
>>>
>>>> $!blah you have $!fixed!blah (or in json instead of { 'blah': 'value',
>>>>> 'foo': 'value' } you would have { "fixed": { "blah": "value", "foo":
>>>>> "value" } }
>>>>>
>>>>> another possibility would be to do something in rsyslog where you use a
>>>>> template to replace all '.' with some other character, and then parse
>>>>>
>>>> the
>>>
>>>> result with mmnormalize, but this is ugly as well.
>>>>>
>>>>> We've got a few cases where field names just don't work (case
>>>>>
>>>> sensitivity
>>>
>>>> , () in field names, etc), so it may be a good idea for someone to
>>>>>
>>>> write
>>>
>>>> a
>>>>
>>>>> mm (message modification) module that goes through all the field names
>>>>>
>>>> and
>>>>
>>>>> sanitizes them, with several options as to what to do (and especially
>>>>>
>>>> what
>>>>
>>>>> to do if the sanitized version already exists, overwrite, try a
>>>>>
>>>> different
>>>
>>>> name, ??)
>>>>>
>>>>> David Lang
>>>>>
>>>>>
>>>>>
>>>>> GIven my lack of control over the incoming logs, I think the simplest
>>>>>>
>>>>>>> solution to this issue would be a way to change the attribute names
>>>>>>> themselves  ( "resp_duration_ms", "resp_code" ).
>>>>>>> Given that I don't know the total space of all possible keys, I'd
>>>>>>>
>>>>>> like
>>>
>>>> this
>>>>>>> to work with the $!all-json property.
>>>>>>>
>>>>>>> If there's not already a way to do this that I'm missing, I think
>>>>>>>
>>>>>> given
>>>
>>>> the
>>>>>>> change in elasticsearch and that the suggested solution to this
>>>>>>>
>>>>>> problem
>>>
>>>> is
>>>>>>> "use logstash", I'd like to look at the possibility of adding a
>>>>>>>
>>>>>> property
>>>>
>>>>> formatter that could handle this sanitization.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 4, 2015 at 11:37 AM, Peter Portante <
>>>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> We are using sub-objects:
>>>>>>>
>>>>>>>>
>>>>>>>> # this is for index names to be like: logstash-YYYY.MM.DD
>>>>>>>> # WARNING: any rsyslog collecting host MUST be running UTC
>>>>>>>> #          if the proper index is to be chosen to hold the
>>>>>>>> #          log entry. If you are running EDT, e.g., then
>>>>>>>> #          the previous day's index will be chosen even
>>>>>>>> #          though the UTC value is the current day, because
>>>>>>>> #          the pattern logic does not convert "timereported"
>>>>>>>> #          to a UTC value before pulling data out of it.
>>>>>>>> template(name="logstash-index-pattern" type="list") {
>>>>>>>>     constant(value="logstash-")
>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>> position.from="1" position.to="4")
>>>>>>>>     constant(value=".")
>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>> position.from="6" position.to="7")
>>>>>>>>     constant(value=".")
>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>> position.from="9" position.to="10")
>>>>>>>>     }
>>>>>>>> # this is for formatting our syslog data in JSON with @timestamp
>>>>>>>>
>>>>>>> using
>>>
>>>> a "hierarchical" metdata namespace
>>>>>>>> template(name="com-redhat-rsyslog-hier"
>>>>>>>>          type="list") {
>>>>>>>>     constant(value="{")
>>>>>>>>     constant(value="\"@timestamp\":\"")
>>>>>>>> property(name="timereported" dateFormat="rfc3339")
>>>>>>>>     constant(value="\",\"@version\":\"2015.09.24-0")
>>>>>>>>     constant(value="\",\"message\":\"")
>>>>>>>> property(name="$.msg" format="json")
>>>>>>>>     constant(value="\",\"hostname\":\"")
>>>>>>>> property(name="$.hostname")
>>>>>>>>     constant(value="\",\"level\":\"")
>>>>>>>>  property(name="$.level")
>>>>>>>>     constant(value="\",\"pid\":\"")
>>>>>>>>  property(name="$.pid")
>>>>>>>>     constant(value="\",\"tags\":\"")
>>>>>>>> property(name="$.tags")
>>>>>>>>     constant(value="\",\"CEE\":")
>>>>>>>> property(name="$!all-json")
>>>>>>>>     constant(value=",\"systemd\":")
>>>>>>>>  property(name="$.systemd")
>>>>>>>>     constant(value=",\"rsyslog\":")
>>>>>>>>  property(name="$.rsyslog")
>>>>>>>>     constant(value="}\n")
>>>>>>>>     }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Dec 4, 2015 at 10:44 AM, Brian Knox <[email protected]
>>>>>>>>
>>>>>>>
>>>> wrote:
>>>>>>>>
>>>>>>>> I found out today that elasticsearch 2.x does not allow field names
>>>>>>>>
>>>>>>> to
>>>
>>>>
>>>>>>>>> have
>>>>>>>>
>>>>>>>> the period character in them.  This is making my life interesting
>>>>>>>>>
>>>>>>>> as
>>>
>>>> I
>>>>
>>>>>
>>>>>>>>> use
>>>>>>>>
>>>>>>>> elasticsearch with rsyslog end to end (no logstash), and a lot of
>>>>>>>>>
>>>>>>>> our
>>>
>>>>
>>>>>>>>> field
>>>>>>>>
>>>>>>>> names have "." as a delimiter in them.
>>>>>>>>>
>>>>>>>>> In a perfect world, I'd like an "elasticsearch" property formatter
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> could look for and replace "." in field names with "_", that would
>>>>>>>>>
>>>>>>>> also
>>>>
>>>>> work with the all-json property, something like:
>>>>>>>>>
>>>>>>>>> property(name="$!all-json" format="elasticsearch")
>>>>>>>>>
>>>>>>>>> Or, if this is to ES specific for rsyslog core, perhaps we could
>>>>>>>>>
>>>>>>>> add
>>>
>>>>
>>>>>>>>> this
>>>>>>>>
>>>>>>>
>>>>>>> functionality to the omelasticsearch output itself (I'll look over
>>>>>>>>
>>>>>>> the
>>>
>>>>
>>>>>>>>> code
>>>>>>>>
>>>>>>>> today).
>>>>>>>>>
>>>>>>>>> I'd like to not have to introduce logstash to my environment just
>>>>>>>>>
>>>>>>>> to
>>>
>>>>
>>>>>>>>> regex
>>>>>>>>
>>>>>>>> a character in field names.  I'm open to other ideas as well, just
>>>>>>>>>
>>>>>>>>> wanted
>>>>>>>>
>>>>>>>
>>>>>>> to start the conversation.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> BRian
>>>>>>>>> _______________________________________________
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>
>>>>>>>>> myriad
>>>>>>>>
>>>>>>>
>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>
>>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>>> rsyslog mailing list
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>
>>>>>>> myriad
>>>>
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>
>>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>
>>>>>> myriad
>>>>
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>
>>>>>> you
>>>
>>>> DON'T LIKE THAT.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>
>>>>> myriad
>>>
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>
>>>> myriad
>>>
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to