Hi, all! 👋 I've opened a pull request for the Alertmanager here <https://github.com/prometheus/alertmanager/pull/3353> that adds support for parsing UTF-8 matchers while also adding restrictions on ambiguous matchers that are permitted in the current version of Alertmanager.
https://github.com/prometheus/alertmanager/pull/3353 *Background* As explained in #3319 <https://github.com/prometheus/alertmanager/issues/3319>, Alertmanager constrains label names to the characters ^[a-zA-Z_][a-zA-Z0-9_]*$ - the same as Prometheus. However, because Alertmanager is such a well known and popular open source project it makes sense to use it for other kinds of alert generators too, and not just Prometheus. An example of this is Grafana, where Alertmanager is used to manage alerts created from both Prometheus and non-Prometheus like datasources, including Graphite, InfluxDB, SQL, Loki and more. The issue with this though is that these other datasources do not share the same constraints as Prometheus when it comes to label names, and so using them with Alertmanager requires some kind of normalization to ensure label matchers match the set of allowed characters before the alert can be sent to the Alertmanager. *What this pull request does* @yuri-tceretian <https://github.com/yuri-tceretian> has proposed adding support for UTF-8 characters in #3321 <https://github.com/prometheus/alertmanager/pull/3321> that updates the existing regular expression to support double quoted UTF-8 sequences as label names, while keeping unquoted label names the same. In this pull request I wanted to propose a different option to where we would add a "simple" LL(1) parser to Alertmanager to parse matchers instead. *Motivation* The original motivation for writing this parser was to add support for matching label names containing . and spaces to grafana/grafana <https://github.com/grafana/grafana>. However, about the same time I learned that Prometheus maintainers agreed to add support for UTF-8 labels in Alertmanager <https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit#bookmark=id.wm24eue9w2eg>, and so I decided to further the work to see if it could be upstreamed to Alertmanager instead. The original source code can be found at grobinson-grafana/matchers <https://github.com/grobinson-grafana/matchers>. *Breaking changes* *Expressions must start and end with open and closing braces* All expressions must start and end with { and }, although this can be relaxed if required. For example foo=bar is not valid, it must be {foo=bar}. *Trailing commas are not permitted* Trailing commas are not permitted. For example {foo=bar,} is not valid, it must be {foo=bar}. *All non [a-zA-Z_:][a-zA-Z0-9_:]* values must be double quoted* The set of unquoted characters is now the same on both sides of the expression. In other words, both label names and label values without double quotes must match the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*. For example {foo=!bar} is not valid, it must be {foo="!bar"}. In current versions of Alertmanager, unquoted label values can contain all UTF-8 code points with the exception of comma, such as {foo=!bar}. There are two reasons for this: 1. It's no longer possible to write ambiguous matchers which I feel is something Alertmanager should fix. For example is {foo=~} equivalent to {foo="~"} or {foo=~""}? 2. If we restrict the =, !, ~ characters to double quotes we can keep the grammar LL(1). Without this restriction lookahead/backtrack is required to parse matchers such as {foo==~!=!~bar} which are valid in current versions of Alertmanager. *Errors* One of the goals with this LL(1) parser is to provide better error messages than what is possible using just a regular expression. This work is still in progress as error messages are not as useful as I would like for EOF, but here is an example of what it looks like for non-EOF cases: {foo=bar 🙂} 9:13: 🙂: invalid input: expected comma or closing '}' {foo with spaces=bar with spaces} 5:9: unexpected with: expected an operator such as '=', '!=', '=~' or '!~' *Benchmarks* I've also provided a number of benchmarks of both the LL(1) parser and regex parser that supports UTF-8. These can be found at grobinson-grafana/matchers-benchmarks <https://github.com/grobinson-grafana/matchers-benchmarks>. However, to run them go.mod must be updated to use the branch https://github.com/grafana/prometheus-alertmanager/tree/yuri-tceretian/utf-8-label-names here. BenchmarkMatchersSimple, BenchmarkPrometheusSimple {foo="bar"} BenchmarkMatchersComplex, BenchmarkPrometheusComplex {foo="bar",bar="foo 🙂","baz"!=qux,qux!="baz 🙂"} BenchmarkMatchersRegexSimple, BenchmarkPrometheusRegexSimple {foo=~"[a-zA-Z_:][a-zA-Z0-9_:]*"} BenchmarkMatchersRegexComplex, BenchmarkPrometheusRegexComplex {foo=~"[a-zA-Z_:][a-zA-Z0-9_:]*",bar=~"[a-zA-Z_:]","baz"!~"[a-zA-Z_:][a-zA-Z0-9_:]*",qux!~"[a-zA-Z_:]"} go test -bench=. -benchmem goos: darwin goarch: arm64 pkg: github.com/grobinson-grafana/matchers-benchmarks BenchmarkMatchersRegexSimple-8 488295 2425 ns/op 3248 B/op 49 allocs/op BenchmarkMatchersRegexComplex-8 138081 9074 ns/op 11448 B/op 169 allocs/op BenchmarkPrometheusRegexSimple-8 329244 3496 ns/op 3531 B/op 58 allocs/op BenchmarkPrometheusRegexComplex-8 95188 12554 ns/op 12619 B/op 204 allocs/op BenchmarkMatchersSimple-8 2888340 414.9 ns/op 56 B/op 2 allocs/op BenchmarkMatchersComplex-8 741590 1628 ns/op 248 B/op 7 allocs/op BenchmarkPrometheusSimple-8 1919209 613.9 ns/op 233 B/op 8 allocs/op BenchmarkPrometheusComplex-8 425430 2803 ns/op 1015 B/op 31 allocs/op PASS ok github.com/grobinson-grafana/matchers-benchmarks 11.766s -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/f0ad49cb-5e5c-41db-87ca-10f056a4c479n%40googlegroups.com.