Hi everyone,

Thanks Tedd for this initiative. 2020 has been a fun year in many ways for 
many of us, but I found at least some time to spend on XMPP.


## Monitoring ##

I spawned the https://observe.jabber.network free monitoring service for 
federated XMPP services.

This service is offered free of charge to anyone who wants their XMPP server 
monitored externally, provided they both federate and are not a known source 
of spam. You can read the high-level overview on the website, so I’ll go into 
a bit of detail on the background and technical stack here.

### Monitoring stack ###

Software-wise, this uses a pair of Prometheus [1] servers (which I had running 
anyway) together with the standard alertmanager which goes along with 
prometheus. Prometheus is the time-series database which collects the uptime 
metrics and generates alerts based on that. The alertmanager is responsible 
for converting these alerts into emails which are then sent (based on the 
labels on the metrics) to the corresponding domain owners.

The thing most related to XMPP itself is the tool which is responsible for 
doing the actual "upness" probing. That is the xmpp-blackbox-exporter [2] 
which I created based on the golang mellium XMPP library (already mentioned 
elsewhere in this thread :)). [Why golang when you are the developer of 
aioxmpp? Simple: prometheus tooling is tightly geared around golang, and 
mellium is better suited for these low-level tasks].

The xmpp-blackbox-exporter supports probing services from the "outside". That 
includes federated XEP-0199 pings using pre-provisioned accounts as well as 
probes on the c2s/s2s ports (either discovered via SRV records or given 
explicitly). It then generates a set of metrics in prometheus format for the 
prometheus server to collect and put in the time series. In addition to the 
"yes/no" (1 or 0 valued metric) tests ("does it ping?", "does it successfully 
c2s connect?"), the exporter also provides the timings of the individual 
connection phases as well as the timestamp of the next certificate expiry in 
the certificate chain.

Then I have a bunch of alerting rules which trigger on thresholds in those 
metrics (think "the average of the 'is up' metric over the last 5 minutes for 
a domain is less than 0.5 -> raise an alert") and those are forwarded to the 
alertmanager for email distribution.

### Why? ###

I built xmpp-blackbox-exporter mainly to monitor my own infrastructure. Then 
it occurred to me that I have these two redundant monitoring servers which are 
more or less idle anyway. So I thought I’d share. That’s really it.

### Success? ###

At this time, there are 25 different domains belonging to 16 different 
entities registered. I can fairly certainly say that the service has already 
prevented one or two outages due to expired certificates (as it also sends 7-
day-ahead warnings) as well as provided information to operators to 
investigate a problem or outage further.

As it doesn’t really cost me much, I already count that as success. But the 
system still has some capacity, so if you need some monitoring … (also, if you 
asked me in the past and got ignored… just ping me again, 2020 has been a fun 
year ;)).


I think that’s the most and only notable one thing I did in the XMPP context 
in 2020. Let’s see what 2021 brings!

kind regards,
Jonas


   [1]: https://prometheus.io/
   [2]: https://github.com/horazont/xmpp-blackbox-exporter

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Reply via email to