mysqld_exporter contains some useful code for a check on heartbeat lagging

groups:
- name: example.rules
  rules:
  - record: mysql_heartbeat_lag_seconds
    expr: mysql_heartbeat_now_timestamp_seconds - 
mysql_heartbeat_stored_timestamp_seconds
   ...
  - alert: MySQLReplicationLag
    expr: (mysql_heartbeat_lag_seconds > 30) and ON(instance) (
predict_linear(mysql_heartbeat_lag_seconds[5m],
      60 * 2) > 0)


Now, in my case the master server_id may change due to the way we operate 
our MySQL cluster, and hence, we may get the following metrics

{instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",server_id="2001500"} 0.5187849998474121
{instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",server_id="3212"}    1594051555.519615


As you can see, for one instance there's multiple metrics only one of which 
is the right one as it refers to the correct server_id. In principle, it's 
easy to determine the correct one as there's also a 
metric mysql_slave_status_master_server_id which returns the correct 
server_id:

mysql_slave_status_master_server_id{instance=
"batchdb001.mo-staging99-nonprod.dus1.cloud",job=
"prometheus-mysqld-exporter",master_host="dbmaster001",master_uuid=
"005e9c3d-baea-11ea-ab06-027e6d15fde3"}.                     2001500

so for the alert definition I would have to take into account the server_id:

- alert: MySQLReplicationLag
    expr: (mysql_heartbeat_lag_seconds{server_id="2001500"} > 30) and ON(
instance) ...

but how to do this in my case, where server_id has to be compared with a 
metrics value (mysql_slave_status_master_server_id)?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cca8ab4b-eae3-4c54-be79-ef1137e6a052o%40googlegroups.com.

Reply via email to