I have been experimenting with the unit test capabilities provided by
promtool and have run into a few issues/gotchas that I can't seem to
understand.
example code:
*rule_files:*
* - ../nodelocal-cache.yaml*
*evaluation_interval: 1m*
*tests:*
* - interval: 1m*
* external_labels:*
* cluster: test*
* input_series:*
* - series: 'coredns_nodecache_setup_errors_total{pod="unit-test",
errortype="configmap"}'*
* values: '1 2 3 4 5 6 7 8 9 10'*
* - series: 'coredns_dns_response_rcode_count_total{job="nodelocal-dns",
rcode="SERVFAIL", zone="."}'*
* values: '0 60 120 180 240 300 360 420 480 540'*
* - series: 'coredns_dns_response_rcode_count_total{job="nodelocal-dns",
rcode="NOERROR", zone="."}'*
* values: '0 120 240 360 480 600 720 840 960 1080'*
* promql_expr_test:*
* - expr: rate(coredns_nodecache_setup_errors_total{}[5m])*
* eval_time: 5m*
* exp_samples:*
* - labels: '{pod="unit-test", errortype="configmap"}'*
* value: 1.6666666666666666E-02*
* - expr: rate(coredns_dns_response_rcode_count_total{}[5m])*
* eval_time: 10m*
* exp_samples:*
* - labels: '{job="nodelocal-dns", rcode="SERVFAIL", zone="."}'*
* value: 1*
* - labels: '{job="nodelocal-dns", rcode="NOERROR", zone="."}'*
* value: 2*
* alert_rule_test:*
* - eval_time: 6m*
* alertname: NodeLocalDNSSetupErrorsHigh*
* exp_alerts:*
* - exp_labels:*
* severity: critical*
* alertname: NodeLocalDNSSetupErrorsHigh*
* errortype: configmap*
* pod: unit-test*
* exp_annotations:*
* description: test:unit-test There are configmap errors
setting up NodeLocalDNS*
* summary: NodeLocalDNS setup errors on test:unit-test*
*----*
*groups:- name: NodeLocalDNS rules: - alert: NodeLocalDNSSetupErrorsHigh
labels:
severity:
critical for: 5m expr: |
rate(coredns_nodecache_setup_errors_total{}[5m]) > 0
annotations:
summary: "NodeLocalDNS setup errors on {{ $externalLabels.cluster }}:{{
$labels.pod }}" description: "{{ $externalLabels.cluster }}:{{
$labels.pod }} There are {{ $labels.errortype }} errors setting up
NodeLocalDNS"*
As you can see I run prom QL test
*rate(coredns_nodecache_setup_errors_total{}[5m])* that evaluates to 1.666.
Therefore when I test NodeLocalDNSSetupErrorsHigh which will trigger when
that value is above 0 for 5 minute period the test only passes if I set
eval_time to 6m, and fails if I set it to 5m (alert doesn't trigger).
What is the relation between the for time in the alert rule itself and the
eval_time in the test?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/08a1f740-cbde-40f0-b20d-f1ed28b7f6d4n%40googlegroups.com.