mysqld_exporter contains some useful code for a check on heartbeat lagging
groups:
- name: example.rules
rules:
- record: mysql_heartbeat_lag_seconds
expr: mysql_heartbeat_now_timestamp_seconds -
mysql_heartbeat_stored_timestamp_seconds
...
- alert: MySQLReplicationLag
expr: (mysql_heartbeat_lag_seconds > 30) and ON(instance) (
predict_linear(mysql_heartbeat_lag_seconds[5m],
60 * 2) > 0)
Now, in my case the master server_id may change due to the way we operate
our MySQL cluster, and hence, we may get the following metrics
{instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",server_id="2001500"} 0.5187849998474121
{instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",server_id="3212"} 1594051555.519615
As you can see, for one instance there's multiple metrics only one of which
is the right one as it refers to the correct server_id. In principle, it's
easy to determine the correct one as there's also a
metric mysql_slave_status_master_server_id which returns the correct
server_id:
mysql_slave_status_master_server_id{instance=
"batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",master_host="dbmaster001",master_uuid=
"005e9c3d-baea-11ea-ab06-027e6d15fde3"}. 2001500
so for the alert definition I would have to take into account the server_id:
- alert: MySQLReplicationLag
expr: (mysql_heartbeat_lag_seconds{server_id="2001500"} > 30) and ON(
instance) ...
but how to do this in my case, where server_id has to be compared with a
metrics value (mysql_slave_status_master_server_id)?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/cca8ab4b-eae3-4c54-be79-ef1137e6a052o%40googlegroups.com.