On Sat, 5 Dec 2015, Peter Portante wrote:
On Sat, Dec 5, 2015 at 5:03 PM, David Lang <[email protected]> wrote:
we really need mmscrubnames or similar
1. change all names to lower case
2. replace characters that rsyslog doesn't allow in names with something
3. allow other characters to be added to the list to be replaced
4. change names that are foo!bar into multi-layer structures
5. handle the case where these changes create nultiple objects with the
same name (probably by appending a string until there are no longer
conflicts)
#1 may be able to go away in a decade or so if we allow case sensitive
names as an option
Don't we need to make this go away sooner than later? If rsyslog is the
link in the chain that prevents someone from getting the key names they
expect into ES, won't they find something else to replace that link?
I have made available RPMs for EPEL 7 (which should work on RHEL 7 and
CentOS 7)P, and Fedora 21, 22, and 23. Why not make the effort to find out
what breaks, and put in a switch so that folks can opt-in to case-sensitive
names in config files? I'd be happy to implement the switch, but would
need help verifying existing configurations work.
this will break some existing configs, won't it? If someone has something that's
assuming everything is squished to lower case, and it becomes case sensitive,
won't that break?
We can add the new case sensitivity as an option quickly, but can't make it the
default for quite a while (a cycle or two of the enterprise distros)
#2 needs to be done on the actual variable names, not just on the ES output
so that the variables can be accessed and manipulated in rsyslog
Why do we need to do this? Is this because we need to reference them in
the configuration files? If so, why not provide an escape syntax for the
configuration file?
Do we really want rsyslog in the position where it adds restrictions to the
data handling pipeline because of how it operates? I think we all agree
that an mmscrubnames module would be good to help put rsyslog in the
position of transforming data from one source to another in the overall
pipeline.
AFAIK, JSON imposes no limits of field names, so any strange character (or
unicode character, or even control character) could be part of a field name. And
even if the JSON spec imposes some limits, do the libraries impose such limits
in practice?
I don't think it makes sense to support all of this in rsyslog, I think it's
reasonable to impose something sane. Other log handling software does this (for
example, logstash doesn't allow '.' in the name, but also is case insensitive
:-)
and finally, #4 is needed to allow the work-around for problems like ES
has.
I am not sure I follow why this allows us to work-around problems like ES
has.
The dots in field names are confusing and ambiguous in ES because you can
reference a hierarchical set of objects in the json objects indexed. So if
one has a field name with dots in it in one document and another document
in the index has a hierarchy with sub objects, then it is ambiguous which
we are dealing with, if I understand the problem correctly.
Ok, that explains why this is an issue, it makes sense. We have the same problem
with '!'. It's a problem in ES because it's a new requirement, breaking existing
input.
But #4 would let us say that '.' is an illegal character, along with control
characters, anything above plain ASCII, and other punctuation characters we
don't allow and get them replaced by something we do allow.
Folks can stay with ES 1.7 if they need the dots in names.
not long term.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.