[prometheus-developers] [feature/proposal] UTF-8 support + restricting ambiguous label matchers

'George Robinson' via Prometheus Developers Tue, 09 May 2023 02:05:54 -0700

Hi, all! 👋

I've opened a pull request for the Alertmanager here 
<https://github.com/prometheus/alertmanager/pull/3353> that adds support 
for parsing UTF-8 matchers while also adding restrictions on 
ambiguous matchers that are permitted in the current version of 
Alertmanager.


https://github.com/prometheus/alertmanager/pull/3353

*Background*

As explained in #3319 
<https://github.com/prometheus/alertmanager/issues/3319>, Alertmanager 
constrains label names to the characters ^[a-zA-Z_][a-zA-Z0-9_]*$ - the 
same as Prometheus.

However, because Alertmanager is such a well known and popular open source 
project it makes sense to use it for other kinds of alert generators too, 
and not just Prometheus. An example of this is Grafana, where Alertmanager 
is used to manage alerts created from both Prometheus and non-Prometheus 
like datasources, including Graphite, InfluxDB, SQL, Loki and more. The 
issue with this though is that these other datasources do not share the 
same constraints as Prometheus when it comes to label names, and so using 
them with Alertmanager requires some kind of normalization to ensure label 
matchers match the set of allowed characters before the alert can be sent 
to the Alertmanager.

*What this pull request does*

@yuri-tceretian <https://github.com/yuri-tceretian> has proposed adding 
support for UTF-8 characters in #3321 
<https://github.com/prometheus/alertmanager/pull/3321> that updates the 
existing regular expression to support double quoted UTF-8 sequences as 
label names, while keeping unquoted label names the same.

In this pull request I wanted to propose a different option to where we 
would add a "simple" LL(1) parser to Alertmanager to parse matchers instead.

*Motivation*

The original motivation for writing this parser was to add support for 
matching label names containing . and spaces to grafana/grafana 
<https://github.com/grafana/grafana>. However, about the same time I 
learned that Prometheus maintainers agreed to add support for UTF-8 labels 
in Alertmanager 
<https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit#bookmark=id.wm24eue9w2eg>,
 
and so I decided to further the work to see if it could be upstreamed to 
Alertmanager instead.

The original source code can be found at grobinson-grafana/matchers 
<https://github.com/grobinson-grafana/matchers>.

*Breaking changes*

*Expressions must start and end with open and closing braces*

All expressions must start and end with { and }, although this can be 
relaxed if required. For example foo=bar is not valid, it must be {foo=bar}.

*Trailing commas are not permitted*

Trailing commas are not permitted. For example {foo=bar,} is not valid, it 
must be {foo=bar}.

*All non [a-zA-Z_:][a-zA-Z0-9_:]* values must be double quoted*

The set of unquoted characters is now the same on both sides of the 
expression. In other words, both label names and label values without 
double quotes must match the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*. 
For example {foo=!bar} is not valid, it must be {foo="!bar"}. In current 
versions of Alertmanager, unquoted label values can contain all UTF-8 code 
points with the exception of comma, such as {foo=!bar}.

There are two reasons for this:

   1. 
   
   It's no longer possible to write ambiguous matchers which I feel is 
   something Alertmanager should fix. For example is {foo=~} equivalent to 
   {foo="~"} or {foo=~""}?
   2. 
   
   If we restrict the =, !, ~ characters to double quotes we can keep the 
   grammar LL(1). Without this restriction lookahead/backtrack is required to 
   parse matchers such as {foo==~!=!~bar} which are valid in current 
   versions of Alertmanager.
   
*Errors*

One of the goals with this LL(1) parser is to provide better error messages 
than what is possible using just a regular expression. This work is still 
in progress as error messages are not as useful as I would like for EOF, 
but here is an example of what it looks like for non-EOF cases:
{foo=bar 🙂} 9:13: 🙂: invalid input: expected comma or closing '}' {foo 
with spaces=bar with spaces} 5:9: unexpected with: expected an operator 
such as '=', '!=', '=~' or '!~'
*Benchmarks*

I've also provided a number of benchmarks of both the LL(1) parser and 
regex parser that supports UTF-8. These can be found at 
grobinson-grafana/matchers-benchmarks 
<https://github.com/grobinson-grafana/matchers-benchmarks>. However, to run 
them go.mod must be updated to use the branch 
https://github.com/grafana/prometheus-alertmanager/tree/yuri-tceretian/utf-8-label-names
 here.
BenchmarkMatchersSimple, BenchmarkPrometheusSimple {foo="bar"} 
BenchmarkMatchersComplex, BenchmarkPrometheusComplex {foo="bar",bar="foo 
🙂","baz"!=qux,qux!="baz 🙂"} BenchmarkMatchersRegexSimple, 
BenchmarkPrometheusRegexSimple {foo=~"[a-zA-Z_:][a-zA-Z0-9_:]*"} 
BenchmarkMatchersRegexComplex, BenchmarkPrometheusRegexComplex 
{foo=~"[a-zA-Z_:][a-zA-Z0-9_:]*",bar=~"[a-zA-Z_:]","baz"!~"[a-zA-Z_:][a-zA-Z0-9_:]*",qux!~"[a-zA-Z_:]"}
 

go test -bench=. -benchmem goos: darwin goarch: arm64 pkg: 
github.com/grobinson-grafana/matchers-benchmarks 
BenchmarkMatchersRegexSimple-8 488295 2425 ns/op 3248 B/op 49 allocs/op 
BenchmarkMatchersRegexComplex-8 138081 9074 ns/op 11448 B/op 169 allocs/op 
BenchmarkPrometheusRegexSimple-8 329244 3496 ns/op 3531 B/op 58 allocs/op 
BenchmarkPrometheusRegexComplex-8 95188 12554 ns/op 12619 B/op 204 
allocs/op BenchmarkMatchersSimple-8 2888340 414.9 ns/op 56 B/op 2 allocs/op 
BenchmarkMatchersComplex-8 741590 1628 ns/op 248 B/op 7 allocs/op 
BenchmarkPrometheusSimple-8 1919209 613.9 ns/op 233 B/op 8 allocs/op 
BenchmarkPrometheusComplex-8 425430 2803 ns/op 1015 B/op 31 allocs/op PASS 
ok github.com/grobinson-grafana/matchers-benchmarks 11.766s

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/f0ad49cb-5e5c-41db-87ca-10f056a4c479n%40googlegroups.com.

[prometheus-developers] [feature/proposal] UTF-8 support + restricting ambiguous label matchers

Reply via email to