2015-12-06 0:29 GMT+01:00 Peter Portante <[email protected]>:
> On Sat, Dec 5, 2015 at 5:03 PM, David Lang <[email protected]> wrote:
>
>> we really need mmscrubnames or similar
>>
>> 1. change all names to lower case
>> 2. replace characters that rsyslog doesn't allow in names with something
>> 3. allow other characters to be added to the list to be replaced
>> 4. change names that are foo!bar into multi-layer structures
>> 5. handle the case where these changes create nultiple objects with the
>> same name (probably by appending a string until there are no longer
>> conflicts)
>>
>> #1 may be able to go away in a decade or so if we allow case sensitive
>> names as an option
>>
>
> Don't we need to make this go away sooner than later?  If rsyslog is the
> link in the chain that prevents someone from getting the key names they
> expect into ES, won't they find something else to replace that link?

Yes, but there is only that much time in a day. I really don't like
the attitude of projects that break their API and expect that
everybody immediately jumps on updating their stuff to work with their
broken new version. I know, life is not fair, but as I said ... time
is a problem.

>
> I have made available RPMs for EPEL 7 (which should work on RHEL 7 and
> CentOS 7)P, and Fedora 21, 22, and 23.  Why not make the effort to find out
> what breaks, and put in a switch so that folks can opt-in to case-sensitive
> names in config files?  I'd be happy to implement the switch, but would
> need help verifying existing configurations work.

I am right now adding at least a small test for what needs to be
preserved. I would suggest that Brian tries your branch. I will not,
however, merge it into 8.15 because there is a big potential we break
existing things and I have no time at all next week to evaluate this
in depth. And I don't want to include a potential problem into a
"holiday release". I really don't like the idea that some of us need
to wreck their xmas vacation because of this.

>
>
>>
>> #4 is needed because project lumberjack defined names to be able to be
>> foo!bar instead of foo { bar } to allow things that don't understand
>> multi-dimension structures to still use them (and I need it to handle a
>> similar problem from my mmnormalize rulesets). We have talked about having
>> this be a post-parser pass in liblognorm, but it really should be available
>> no matter where the field names originate (currently mmnormalize,
>> mmjsonparse, or mmexternal I believe)
>>
>
> imjournal also creates json variables, but I don't think it uses a
> hierarchy.
>
>
>>
>> #2 needs to be done on the actual variable names, not just on the ES
>> output so that the variables can be accessed and manipulated in rsyslog
>>
>
> Why do we need to do this?  Is this because we need to reference them in
> the configuration files?  If so, why not provide an escape syntax for the
> configuration file?
>
> Do we really want rsyslog in the position where it adds restrictions to the
> data handling pipeline because of how it operates?  I think we all agree
> that an mmscrubnames module would be good to help put rsyslog in the
> position of transforming data from one source to another in the overall
> pipeline.
>
>
>>
>> #5 needs to be done or {"Foo":1,"foo":2} will loose data
>
>
>> and finally, #4 is needed to allow the work-around for problems like ES
>> has.
>>
>
> I am not sure I follow why this allows us to work-around problems like ES
> has.
>
> The dots in field names are confusing and ambiguous in ES because you can
> reference a hierarchical set of objects in the json objects indexed.  So if
> one has a field name with dots in it in one document and another document
> in the index has a hierarchy with sub objects, then it is ambiguous which
> we are dealing with, if I understand the problem correctly.

but can't they escape that? ;)

Rainer

>
> Folks can stay with ES 1.7 if they need the dots in names.
>
>
>>
>> #1,2,3,5 are all extremely similar, once the code is available to do one,
>> all the others become trivial. #4 is a bit more complex, but I think it's
>> similar enough to do in the same module.
>>
>> David Lang
>>
>> On Sat, 5 Dec 2015, Rainer Gerhards wrote:
>>
>> Date: Sat, 5 Dec 2015 22:42:17 +0100
>>> From: Rainer Gerhards <[email protected]>
>>> Reply-To: rsyslog-users <[email protected]>
>>> To: rsyslog-users <[email protected]>
>>> Subject: Re: [rsyslog] elasticsearch 2.0 and field names
>>>
>>>
>>> Can you file a feature request for melasticsesarch on github. I guess a
>>> quick but useful back could be done there.
>>>
>>> Rainer
>>> Am 05.12.2015 16:15 schrieb "Brian Knox" <[email protected]>:
>>>
>>> David - yes, that exactly describes the situation that I'm in. If I can't
>>>> find a short term solution with existing capabilities, I may look into
>>>> providing a load balanced pool of sanitization workers that I connect to
>>>> over the zeromq plugins I've been working on as a more near term
>>>> solution.
>>>> Ideally, I'd like to be able to handle the sanitization within rsyslog
>>>> itself.
>>>>
>>>> For a quick hack, a template on my output from my aggregators replacing
>>>> "."
>>>> characters with "_" might work and I'll give that a spin.  I still have
>>>> an
>>>> elasticsearch 1.5 cluster that is our production cluster in parallel with
>>>> the new 2.1 cluster, so I have some room to experiment.
>>>>
>>>> As an aside - does anyone have a link to a config example using a regex
>>>> replace on a property using the new v8 template format?
>>>>
>>>> Peter - I'd be very interested if you have an approach to this problem
>>>> that
>>>> works with existing syslog capability.
>>>>
>>>> Cheers,
>>>> Brian
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 4, 2015 at 3:28 PM, Peter Portante <
>>>> [email protected]
>>>>
>>>>>
>>>>> wrote:
>>>>
>>>> On Fri, Dec 4, 2015 at 3:00 PM, David Lang <[email protected]> wrote:
>>>>>
>>>>> On Fri, 4 Dec 2015, Peter Portante wrote:
>>>>>>
>>>>>> On Fri, Dec 4, 2015 at 12:40 PM, Brian Knox <[email protected]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> In my case, I have "flat" ( 1 level deep ) CEE JSON logs with field
>>>>>>>
>>>>>> names
>>>>>
>>>>>> that are dot delimited  (  @cee { "resp.duration_ms" : 10000,
>>>>>>>> "resp.code" :
>>>>>>>> 200 }  ).
>>>>>>>>
>>>>>>>>
>>>>>>>> So if you have a "flat" namespace where the fields include dots in
>>>>>>>
>>>>>> them,
>>>>
>>>>> then if you move to a hierarchical namespace then won't the field name
>>>>>>> references still work?
>>>>>>>
>>>>>>>
>>>>>> the problem he's having is the the field names in his incoming logs are
>>>>>> not hierarchical. He's not hand-crafting the structure the way you are,
>>>>>> he's parsing incoming logs and then outputting $! to ES (or something
>>>>>> similar)
>>>>>>
>>>>>> As such, he's pretty much stuck with the names on the incoming data.
>>>>>>
>>>>>>
>>>>> We are using rsyslog to normalize the data.  I'll post an example config
>>>>> file for what we are doing shortly (prolly on github).
>>>>>
>>>>> -peter
>>>>>
>>>>>
>>>>>
>>>>>> Rsyslog hasn't had a requirement before now to change/sanitize the
>>>>>>
>>>>> field
>>>>
>>>>> names, so there's nothing setup to do this.
>>>>>>
>>>>>> the work-around that I can think of basically involved re-parsing the
>>>>>> message after manipulating it.
>>>>>>
>>>>>> you could use omexternal to pass the json data to an external script
>>>>>>
>>>>> that
>>>>
>>>>> can muck with the names and pass them back. unfortunantly this
>>>>>>
>>>>> interface
>>>>
>>>>> can't delete fields, just alter or add them, so you would want to do
>>>>>> something along the lines of moving everything down a level so instead
>>>>>>
>>>>> of
>>>>
>>>>> $!blah you have $!fixed!blah (or in json instead of { 'blah': 'value',
>>>>>> 'foo': 'value' } you would have { "fixed": { "blah": "value", "foo":
>>>>>> "value" } }
>>>>>>
>>>>>> another possibility would be to do something in rsyslog where you use a
>>>>>> template to replace all '.' with some other character, and then parse
>>>>>>
>>>>> the
>>>>
>>>>> result with mmnormalize, but this is ugly as well.
>>>>>>
>>>>>> We've got a few cases where field names just don't work (case
>>>>>>
>>>>> sensitivity
>>>>
>>>>> , () in field names, etc), so it may be a good idea for someone to
>>>>>>
>>>>> write
>>>>
>>>>> a
>>>>>
>>>>>> mm (message modification) module that goes through all the field names
>>>>>>
>>>>> and
>>>>>
>>>>>> sanitizes them, with several options as to what to do (and especially
>>>>>>
>>>>> what
>>>>>
>>>>>> to do if the sanitized version already exists, overwrite, try a
>>>>>>
>>>>> different
>>>>
>>>>> name, ??)
>>>>>>
>>>>>> David Lang
>>>>>>
>>>>>>
>>>>>>
>>>>>> GIven my lack of control over the incoming logs, I think the simplest
>>>>>>>
>>>>>>>> solution to this issue would be a way to change the attribute names
>>>>>>>> themselves  ( "resp_duration_ms", "resp_code" ).
>>>>>>>> Given that I don't know the total space of all possible keys, I'd
>>>>>>>>
>>>>>>> like
>>>>
>>>>> this
>>>>>>>> to work with the $!all-json property.
>>>>>>>>
>>>>>>>> If there's not already a way to do this that I'm missing, I think
>>>>>>>>
>>>>>>> given
>>>>
>>>>> the
>>>>>>>> change in elasticsearch and that the suggested solution to this
>>>>>>>>
>>>>>>> problem
>>>>
>>>>> is
>>>>>>>> "use logstash", I'd like to look at the possibility of adding a
>>>>>>>>
>>>>>>> property
>>>>>
>>>>>> formatter that could handle this sanitization.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Dec 4, 2015 at 11:37 AM, Peter Portante <
>>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> We are using sub-objects:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> # this is for index names to be like: logstash-YYYY.MM.DD
>>>>>>>>> # WARNING: any rsyslog collecting host MUST be running UTC
>>>>>>>>> #          if the proper index is to be chosen to hold the
>>>>>>>>> #          log entry. If you are running EDT, e.g., then
>>>>>>>>> #          the previous day's index will be chosen even
>>>>>>>>> #          though the UTC value is the current day, because
>>>>>>>>> #          the pattern logic does not convert "timereported"
>>>>>>>>> #          to a UTC value before pulling data out of it.
>>>>>>>>> template(name="logstash-index-pattern" type="list") {
>>>>>>>>>     constant(value="logstash-")
>>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>>> position.from="1" position.to="4")
>>>>>>>>>     constant(value=".")
>>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>>> position.from="6" position.to="7")
>>>>>>>>>     constant(value=".")
>>>>>>>>>     property(name="timereported" dateFormat="rfc3339"
>>>>>>>>> position.from="9" position.to="10")
>>>>>>>>>     }
>>>>>>>>> # this is for formatting our syslog data in JSON with @timestamp
>>>>>>>>>
>>>>>>>> using
>>>>
>>>>> a "hierarchical" metdata namespace
>>>>>>>>> template(name="com-redhat-rsyslog-hier"
>>>>>>>>>          type="list") {
>>>>>>>>>     constant(value="{")
>>>>>>>>>     constant(value="\"@timestamp\":\"")
>>>>>>>>> property(name="timereported" dateFormat="rfc3339")
>>>>>>>>>     constant(value="\",\"@version\":\"2015.09.24-0")
>>>>>>>>>     constant(value="\",\"message\":\"")
>>>>>>>>> property(name="$.msg" format="json")
>>>>>>>>>     constant(value="\",\"hostname\":\"")
>>>>>>>>> property(name="$.hostname")
>>>>>>>>>     constant(value="\",\"level\":\"")
>>>>>>>>>  property(name="$.level")
>>>>>>>>>     constant(value="\",\"pid\":\"")
>>>>>>>>>  property(name="$.pid")
>>>>>>>>>     constant(value="\",\"tags\":\"")
>>>>>>>>> property(name="$.tags")
>>>>>>>>>     constant(value="\",\"CEE\":")
>>>>>>>>> property(name="$!all-json")
>>>>>>>>>     constant(value=",\"systemd\":")
>>>>>>>>>  property(name="$.systemd")
>>>>>>>>>     constant(value=",\"rsyslog\":")
>>>>>>>>>  property(name="$.rsyslog")
>>>>>>>>>     constant(value="}\n")
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Dec 4, 2015 at 10:44 AM, Brian Knox <[email protected]
>>>>>>>>>
>>>>>>>>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I found out today that elasticsearch 2.x does not allow field names
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>>
>>>>>>>>>> have
>>>>>>>>>
>>>>>>>>> the period character in them.  This is making my life interesting
>>>>>>>>>>
>>>>>>>>> as
>>>>
>>>>> I
>>>>>
>>>>>>
>>>>>>>>>> use
>>>>>>>>>
>>>>>>>>> elasticsearch with rsyslog end to end (no logstash), and a lot of
>>>>>>>>>>
>>>>>>>>> our
>>>>
>>>>>
>>>>>>>>>> field
>>>>>>>>>
>>>>>>>>> names have "." as a delimiter in them.
>>>>>>>>>>
>>>>>>>>>> In a perfect world, I'd like an "elasticsearch" property formatter
>>>>>>>>>>
>>>>>>>>> that
>>>>>
>>>>>> could look for and replace "." in field names with "_", that would
>>>>>>>>>>
>>>>>>>>> also
>>>>>
>>>>>> work with the all-json property, something like:
>>>>>>>>>>
>>>>>>>>>> property(name="$!all-json" format="elasticsearch")
>>>>>>>>>>
>>>>>>>>>> Or, if this is to ES specific for rsyslog core, perhaps we could
>>>>>>>>>>
>>>>>>>>> add
>>>>
>>>>>
>>>>>>>>>> this
>>>>>>>>>
>>>>>>>>
>>>>>>>> functionality to the omelasticsearch output itself (I'll look over
>>>>>>>>>
>>>>>>>> the
>>>>
>>>>>
>>>>>>>>>> code
>>>>>>>>>
>>>>>>>>> today).
>>>>>>>>>>
>>>>>>>>>> I'd like to not have to introduce logstash to my environment just
>>>>>>>>>>
>>>>>>>>> to
>>>>
>>>>>
>>>>>>>>>> regex
>>>>>>>>>
>>>>>>>>> a character in field names.  I'm open to other ideas as well, just
>>>>>>>>>>
>>>>>>>>>> wanted
>>>>>>>>>
>>>>>>>>
>>>>>>>> to start the conversation.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> BRian
>>>>>>>>>> _______________________________________________
>>>>>>>>>> rsyslog mailing list
>>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>>
>>>>>>>>>> myriad
>>>>>>>>>
>>>>>>>>
>>>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>
>>>>>>>> you
>>>>
>>>>> DON'T LIKE THAT.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>>
>>>>>>>>> rsyslog mailing list
>>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>>
>>>>>>>> myriad
>>>>>
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>>
>>>>>>>> you
>>>>
>>>>> DON'T LIKE THAT.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>>> rsyslog mailing list
>>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>>
>>>>>>> myriad
>>>>>
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
>>>>>>>>
>>>>>>> you
>>>>
>>>>> DON'T LIKE THAT.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>> rsyslog mailing list
>>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>>> http://www.rsyslog.com/professional-services/
>>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>>
>>>>>> myriad
>>>>
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>>> DON'T LIKE THAT.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>>
>>>>> myriad
>>>>
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T 
> LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to