I propose that we just disallow having dots in the field name. Dots seem to
have a special meaning and as we keep adding data stores we may run into some
unintended behavior. We should have logic in our code to check for it and
either auto-correct it (replace with underscores?) or at least throw an error
or a warning.
Thanks,
James
07.09.2018, 16:33, "Ryan Merriman" :
> Internal means it’s not configurable, doesn’t contain our default separator
> (dots) and is namespaced with metron. We can definitely improve on DRY but
> there’s more to it than that. For example, having 2 different versions of
> this field name (ES and Solr) adds a significant amount of complexity for no
> real benefit.
>
>> On Sep 7, 2018, at 5:12 PM, Michael Miklavcic
>> wrote:
>>
>> Can you elaborate on what you mean by "convert to internal?" From your
>> description, it looks like the challenge is from our violations of DRY when
>> it comes to constants referencing those keys, which would be eliminated by
>> refactoring.
>>
>>> On Fri, Sep 7, 2018, 3:50 PM Ryan Merriman wrote:
>>>
>>> I recently worked on a PR that involved changing the default behavior of
>>> the ElasticsearchWriter to store data using field names with the default
>>> Metron separator, dots. One of the unfortunate consequences of this is
>>> that although dots are allowed in more recent versions of ES, it changes
>>> how these fields are stored. Having a dot in a field name causes ES to
>>> treat it as an object field type. We're not quite comfortable with this
>>> because it could introduce unforeseen side effects that may not be
>>> obvious. Here's the PR: https://github.com/apache/metron/pull/1181
>>>
>>> As I worked through it I noticed there are a couple fields that include
>>> separators where it's not actually necessary. They are not nested by
>>> nature and are internal to Metron. The fact that they are internal means
>>> they show up in constants and are hardcoded in several different places.
>>> That made the work in the PR above much harder and tedious than it should
>>> have been. There are 2 in particular that I had to deal with: source:type
>>> and threat:triage:score in metaalerts.
>>>
>>> Is it worth considering converting these to internal Metron fields so that
>>> they stay constant and this isn't a problem in the future? I could see
>>> these fields following the same pattern as 'metron_alert'. However this
>>> would cause pain when upgrading because existing data would need to be
>>> updated with these new fields.
>>>
>>> Just an idea. Curious if other have an opinion on the subject.
---
Thank you,
James Sirota
PMC- Apache Metron
jsirota AT apache DOT org